Вы находитесь на странице: 1из 943

Welcome to the Companion Website for the 3rd edition of Discovering Statistics Using SPSS by Andy Field.

Student Resources This section contains a range of resources that are available to students who use Andy Field - Discovering Statistics Using SPSS, Third Edition. The resources available include: Interactive MCQs Flashcard glossary containing all the terms you need to study for assessment Smart Alex's answers Additional web material SPSS Flash animations Study skills

Click on the links on the left-hand side to access this student material.

Additional DSUS Material

Chapter 1

Self-Test Answers

Based on what you have read in this section, what qualities do you think a scientific theory should have?
A good theory should do the following: 1. Explain the existing data. 2. Explain a range of related observations. 3. Allow statements to be made about the state of the world. 4. Allow predictions about the future. 5. Have implications.

What is the difference between reliability and validity?

Validity is whether an instrument measures what it was designed to measure, whereas reliability is the ability of the instrument to produce the same results under the same conditions.

13

Why is randomization important?

It is important because it rules out confounding variables (factors that could influence the outcome variable other than the factor in which youre interested). For example, with groups of people, random allocation of people to groups should mean that factors such as intelligence, age, gender and so on are roughly equal in each group and so will not systematically affect the results of the experiment.

Compute the mean but excluding the score of 252.


The range is the lowest score (22) subtracted from the highest score (now 121), which gives us 12122 = 99.

First, we first add up all of the scores:

x
i =1

= 22 + 40 + 53 + 57 + 93 + 98 + 103 + 108 + 116 + 121 = 811

We then divide by the number of scores (in this case 11):

X=

x
i =1

811 = 81.1 10

14

The mean is 81.1 friends.

Twenty-one heavy smokers were put on a treadmill at the fastest setting. The time in seconds was measured until they fell off from exhaustion: 18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57. Compute the mode, median, upper and lower quartiles, range and interquartile range.
First, lets arrange the scores in ascending order: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57. The Mode: The scores with frequencies in brackets are: 16 (1), 18 (2), 22 (2), 23 (2), 24 (1), 26 (1), 29 (1), 32 (1), 34 (2), 36 (2), 42 (1), 43 (1), 46 (2), 49 (1), 57 (1). Therefore, there are several modes because 18, 22, 23, 34, 36 and 46 seconds all have frequencies of 2, and 2 is the largest frequency. These data are multimodal (and the mode is, therefore, not particularly helpful to us). The Median: The median will be the (n + 1)/2 score. There are 21 scores, so this will be 22/2 = 11. The 11th score in our ordered list is 32 seconds. The Mean: The mean is 32.19 seconds:

X=

x
i =1

n 16 + 18 + 22 + 22 + 23 + 23 + 24 + 26 + 29 + 32 + 34 + 36 + 36 + 42 + 43 + 46 + 46 + 49 + 57 = 21 676 = 21 = 32.19

15

The Lower Quartile: This is the median of the lower half of scores. If we split the data at 32 (not including this score), there are 10 scores below this value. The median of 10 = 11/2 = 5.5th score. Therefore, we take the average of the 5th score and the 6th score. The 5th score is 22 and the 6th is 23; the lower quartile is therefore 22.5 seconds. The Upper Quartile: This is the median of the upper half of scores. If we split the data at 32 (not including this score), there are 10 scores above this value. The median of 10 = 11/2 = 5.5th score above the median. Therefore, we take the average of the 5th score above the median and the 6th score above the median. The 5th score above the median is 42 and the 6th is 43; the upper quartile is therefore 42.5 seconds. The Range; This is the highest score (57) minus the lowest (18), i.e. 39 seconds. The Interquartile Range: This is the difference between the upper and lower quartile: 42.522.5 = 20.

Assuming the same mean and standard deviation for the Beachy Head example above, whats the probability that someone who threw themselves off of Beachy Head was 30 or younger?

As in the example, we know that the mean of the suicide scores was 36, and the standard deviation 13. First we convert our value to a z-score: the 30 becomes (3036)/13 = 0.46. We want the area below this value (because 30 is below the mean), but this value is not tabulated in the Appendix. However, because the distribution is symmetrical, we could instead ignore the minus sign and look up this value in the column labelled Smaller Portion (i.e. the area above the value 0.46). You should find that the probability is .32276, or, put another way, a 32.28% chance that a suicide victim would be 30 years old or younger. By looking at the column labelled Bigger Portion we can also see the probability

16

that a suicide victim was aged 30 or more! This probability is .67724, or theres a 67.72% chance that a suicide victim was older than 30 years old!

Chapter 2

Self-Test Answers

We came across some data about the number of friends that 11 people had on Facebook (22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252). We calculated the mean for these data as 96.64. Now calculate the sums of squares, variance and standard deviation.

To calculate the sum of squares, take the mean from each value, then square this difference. Finally, add up these squared values:

So, the sum of squared errors is a massive 37544.55. The variance is the sum of squared errors divided by the degrees of freedom (N 1). There were 11 scores and so the degrees of freedom were 10. The variance is, therefore, 37544.55/10 = 3754.45.

17

Finally, the standard deviation is the square root of the variance:

3754.45 = 61.27.

Calculate these values again but excluding the outlier (252).

To calculate the sum of squares, take the mean from each value (note that it has changed because the outlier is excluded), then square this difference. Finally, add up these squared values:

So, the sum of squared errors is 10992.90. The variance is the sum of squared errors divided by the degrees of freedom (N 1). There were 10 scores and so the degrees of freedom were 9. The variance is, therefore, 10992.90/9 = 1221.43. Finally, the standard deviation is the square root of the variance:

1221.43 = 34.95.

Note then that like the mean itself the standard deviation is hugely influenced by outliers: the removal of this one value has halved the standard deviation!

18

We came across some data about the number of friends that 11 people had on Facebook. We calculated the mean for these data as 96.64 and standard deviation as 61.27. Calculate a 95% confidence interval for this mean.
First we need to calculate the standard error,

X =

s N

61.27 11

= 18.47

The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, N 1. With 11 data points, the degrees of freedom are 10. For a 95% confidence interval we can look up the value in the column labelled Two-Tailed Test, 0.05 in the table of critical values of the tdistribution (Appendix). The corresponding value is 2.23. The confidence intervals are, therefore:

Lower Boundary of Confidence Interval = X (2.23 SE) = 96.64 (2.23 18.47) = 55.63
Upper Boundary of Confidence Interval = X (2.23 SE) = 96.64 + (2.23 18.47) =137.65

Recalculate the confidence interval assuming that the sample size was 56.

First we need to calculate the new standard error,

X =

s N

61.27 56

= 8.19

The sample is big now, so to calculate the confidence interval we can use the critical value of z for a 95% confidence interval (i.e. 1.96). The confidence intervals are therefore:

Lower Boundary of Confidence Interval = X (1.96 SE) = 96.64 (1.96 8.19) = 80.59 Upper Boundary of Confidence Interval = X (1.96 SE) = 96.64 + (1.96 8.19) = 112.69

19

What are the null and alternative hypotheses for the following questions:

Is there a relationship between the amount of gibberish that people speak and the amount of vodka jelly theyve eaten?
o Null Hypothesis: There will be no relationship between the amount of gibberish that

people speak and the amount of vodka jelly theyve eaten.


o Null Hypothesis: There will be a relationship between the amount of gibberish that

people speak and the amount of vodka jelly theyve eaten. Is the mean amount of chocolate eaten higher when writing statistics books then when not?
o Null Hypothesis: There will be no difference in the mean amount of chocolate eaten

when writing statistics textbooks compared to when not writing them.


o Alternative Hypothesis: The mean amount of chocolate eaten when writing statistics

textbooks will be higher than when not writing them.

Chapter 3

Self-Test Answers

Why is the number of friends variable a scale variable?

20

It is a scale variable because the numbers represent consistent intervals and ratios along the measurement scale: the difference between having (for example) 1 and 2 friends is the same as the difference between having (for example) 10 and 11 friends, and (for example) 20 friends is twice as many as 10.

Having created the first four variables with a bit of guidance, try to enter the rest of the variables in Table 2.1 yourself.
The finished data and variable views should look like those in the figure below (more or less!). You can also download this data file (Data with which to play.sav).

21

Chapter 4

Self-Test Answers

What does a histogram show?

A histogram is a graph in which values of observations are plotted on the horizontal axis, and the frequency with which each value occurs in the data set is plotted on the vertical axis.

Produce boxplots for the day 2 and day 3 hygiene scores and interpret them.
The boxplot for the day 2 data should look like this:

22

Note that as for day 1 the females are slightly more fragrant than males (look at the median line). However, if you compare these to the day 1 boxplots (in the book) scores are getting lower (i.e. people are getting less hygienic). In the males there are now more outliers (i.e. a rebellious few who have maintained their sanitary standards). The boxplot for the day 3 data should look like this:

Note that compared to day 1 and day 2 the females are getting more like the males (i.e. smelly). However, if you look at the top whisker, this is much longer for the females. In other words, the top 25% of females are more variable in how smelly they are compared to males. Also, the top score is higher than for males. So, at the top end females are better at maintaining their hygiene at the festival compared to males. Also, the box is longer for females, and although both boxes start at the same score, the top edge of the box is higher in females, again suggesting that above the median score more women are achieving higher levels of hygiene than men. Finally, note that for both days 1 and 2, the boxplots have become less symmetrical (the top whiskers are longer than the bottom whiskers). On day 1 (see the book chapter), which is symmetrical, the

23

whiskers on either side of the box are of equal length (the range of the top and bottom 25% of scores is the same); however, on days 2 and 3 the whisker coming out of the top of the box is longer than that at the bottom, which shows that the distribution is skewed (i.e. the top 25% of scores is spread out over a wider range than the bottom 25%).

Use what you learnt earlier to add error bars to this graph and to label both the x- (I suggest Time) and y-axis (I suggest Mean Grammar Score (%)).

Simple Line Charts for Independent Means


To begin with, imagine that a film company director was interested in whether there was really such a thing as a chick flick (a film that typically appeals to women more than men). He took 20 men and 20 women and showed half of each sample a film that was supposed to be a chick flick (Bridget Jones Diary), and the other half of each sample a film that didnt fall into the category of chick flick (Memento, a brilliant film by the way). In all cases he measured their physiological arousal as a measure of how much they enjoyed the film. The data are in a file called ChickFlick.sav on the companion website. Load this file now. First of all, lets just plot the mean rating of the two films. We have just one grouping variable (the film) and one outcome (the arousal); therefore, we want a simple line chart. Therefore, in the Chart Builder double-click on the icon for a simple line chart. On the canvas you will see a graph and two drop zones: one for the y-axis and one for the x-axis. The y-axis needs to be the dependent variable, or the thing youve measured, or more simply the thing for which you want to display the mean. In this case it would be arousal,

24

so select arousal from the variable list and drag it into the y-axis drop zone (

).

The x-axis should be the variable by which we want to split the arousal data. To plot the means for the two films, select the variable film from the variable list and drag it into the drop zone for the x-axis ( ).

Dialog boxes for a simple line chart with error bar

The figure above shows some other options for the line chart. The main dialog box should appear when you select the type of graph you want, but if it doesnt click on in the

Chart Builder. There are three important features of this dialog box. The first is that, by default, the lines will display the mean value. This is fine, but just note that you can plot other summary statistics such as the median or mode. Second, you can adjust the form of

25

the line that you plot. The default is a straight line, but you can have others like a spline (curved line). Finally, we can ask SPSS to add error bars to our line chart by selecting . We have a choice of what our error bars represent. Normally, error bars show the 95% confidence interval, and I have selected this option ( ). Note, though,

that you can change the width of the confidence interval displayed by changing the 95 to a different value. You can also display the standard error (the default is to show two standard errors, but you can change this to one) or standard deviation (again, the default is two but this could be changed to one or another value). Its important that when you change these properties that you click on applied to Chart Builder. Click on : if you dont then the changes will not be

to produce the graph.

Line chart of the mean arousal for each of the two films The resulting line chart displays the mean (and the confidence interval of those means). This graph shows us that, on average, people were more aroused by Memento than they were by Bridget Jones Diary. However, we originally wanted to look for gender effects, so this graph isnt really telling us what we need to know. The graph we need is a multiple line graph.

26

Multiple Line Charts for Independent Means


To do a multiple line chart for means that are independent (i.e. have come from different groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the book chapter). On the canvas you will see a graph as with the simple line chart but there is now an extra drop zone: . All we need to do is to drag our second grouping

variable into this drop zone. As with the previous example then, select arousal from the variable list and drag it into , select film from the variable list and drag it into

. In addition, though, we can now select the gender variable and drag it into . This will mean that lines representing males and females will be displayed in different colours. As in the previous section, select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph.

27

Dialog boxes for a multiple line chart with error bars

28

Line chart of the mean arousal for each of the two films The resulting line chart tells us the same as the simple line graph: that is, arousal was overall higher for Memento than Bridget Jones Diary, but it also splits this information by gender. Look first at the mean arousal for Bridget Jones Diary; this shows that males were actually more aroused during this film than females. This indicates they enjoyed the film more than the women did! Contrast this with Memento, for which arousal levels are comparable in males and females. On the face of it, this contradicts the idea of a chick flick: it actually seems that men enjoy chick flicks more than the chicks (probably because its the only help we get to understand the complex workings of the female mind!).

Simple Line Charts for Related Means


Hiccups can be a serious problem. Charles Osborne apparently got a case of hiccups while slaughtering a hog (well, who wouldnt?) that lasted 67 years. People have many methods for stopping hiccups (a surprise, holding your breath), but actually medical science has put its collective mind to the task too. The official treatment methods include tonguepulling manoeuvres, massage of the carotid artery and, believe it or not, digital rectal massage (Fesmire, 1988). I dont know the details of what the digital rectal massage involved, but I can probably imagine. Lets say we wanted to put this to the test. We took 15 hiccup sufferers, and during a bout of hiccups administered each of the three procedures (in random order and at 5 minute intervals) after taking a baseline of how many hiccups they had per minute. We counted the number of hiccups in the minute after each procedure. Load the file Hiccups.sav. Note that these data are laid out in different columns; there is no grouping variable that specifies the interventions because each patient experienced all interventions. In the previous two examples we have used grouping variables to specify aspects of the graph (e.g. we used the grouping variable film to specify the x-axis). For

29

repeated-measures data we will not have these grouping variables and so the process of building a graph is a little more complicated (but not a lot more). To plot the mean number of hiccups go to the Chart Builder and double-click on the icon for a simple line chart. As before, you will see a graph on the canvas with drop zones for the x- and y-axes. Previously we specified the column in our data that contained data from out outcome measure on the y-axis, but for these data we have four columns containing data on the number of hiccups (the outcome variable). What we have to do then is to drag all four of these variables from the variable list into the y-axis drop zone. We have to do this simultaneously. To do this we first need to select multiple items in the variable list: to do this select the first variable by clicking on it with the mouse. The variable will be highlighted in blue. Now, hold down the Ctrl key on the keyboard and click on a second variable. Both variables are now highlighted in blue. Again, hold down the Ctrl key and click on a third variable in the variable list and so on for the fourth. In cases in which you want to select a list of consecutive variables, You can do this very quickly by simply clicking on the first variable that we want to select (in this case baseline), then hold down the Shift key on the keyboard and then click on the last variable that you want to select (in this case digital rectal massage); notice that all of the variables in between have been selected too. Once the four variables are selected you can drag them by clicking on any one of the variables and then dragging them into as shown in the figure:

30

Specifying a simple line chart for repeated-measures data Once you have dragged the four variables onto the y-axis drop zones a new dialog box appears. This box tells us that SPSS is creating two temporary variables. One is called Summary, which is going to be the outcome variable (i.e. what we measured in this case the number of hiccups per minute. The other is called index and this variable will represent out independent variable (i.e. what we manipulated in this case the type of intervention). Why does SPSS call them index and summary? Its just because it doesnt

31

know what your particular variables represent so these are just temporary names that we should change! Just click on to get rid of this dialog box.

The Create Summary Group dialog box

Setting Element Properties for a repeated-measures graph We need to edit some of the properties of the graph. The figure shows the options that need to be set: if you cant see this dialog box then click on in the Chart Builder. In

32

the left panel of the figure just note that I have selected to display error bars (see the previous two sections for more information). The middle panel is accessed by clicking on XAxis1 (Line1) in the list labelled Edit Properties of and this allows us to edit properties of the horizontal axis. The first thing we need to do is give the axis a title and I have typed Intervention in the space labelled Axis Label. This label will appear on the graph. Also, we can change the order of our variables if we want to by selecting a variable in the list labelled Order and moving it up down using and . If we change our mind about

displaying one of our variables then we can also remove it from the list by selecting it and clicking on . Click on for these changes to take effect. The right panel is accessed

by clicking on Y-Axis1 (Line1) in the list labelled Edit Properties of and it allows us to edit properties of the vertical axis. The main change that I have made here is to give the axis a label so that the final graph has a useful description on the axis (by default it will just say Mean, which isnt very helpful I have typed Mean Number of Hiccups Per Minute in the box labelled Axis Label. Also note that you can use this dialog box to set the scale of the vertical axis (the minimum value, maximum value and the major increment, which is how often a mark is made on the axis). Mostly you can let SPSS construct the scale automatically and it will be fairly sensible and even if its not you can edit it later. Click on to apply the changes.

33

Completed Chart Builder for a repeated-measures graph Click on to produce the graph. The resulting line chart displays the mean (and the

confidence interval of those means) number of hiccups at baseline and after the three interventions. Note that the axis labels that we typed in have appeared on the graph. We can conclude that the amount of hiccups after tongue pulling was about the same as at baseline; however, carotid artery massage reduced hiccups, but not by as much as a good old-fashioned digital rectal massage. The moral here is: if you have hiccups, find something digital and go amuse yourself for a few minutes.

34

Line chart of the mean number of hiccups at baseline and after various interventions

Multiple Line Charts for Related Means


Just like bar charts, these, to the best of my knowledge, cant be done. I could be wrong though I often am.

Multiple Line Charts for Mixed Designs


The Chart Builder might not be able to do charts for multiple repeated-measures variables, but it can graph what is known as a mixed design. This is a design in which you have one or more independent variable measured using different groups, and one or more independent variables measured using the same sample. Basically, the Chart Builder can produce a graph provided you have only one variable that was a repeated-measure.

35

We all like to text-message (especially students in my lectures who feel the need to text message the person next to them to say Bloody hell, this guy is so boring I need to poke out my own eyes). What will happen to the children, though? Not only will they develop super-sized thumbs, but they might not learn correct written English. Imagine we conducted an experiment in which a group of 25 children was encouraged to send text messages on their mobile phones over a six-month period. A second group of 25 was forbidden from sending text messages for the same period. To ensure that kids in this latter group didnt use their phones, this group were given armbands that administered painful shocks in the presence of microwaves (like those emitted from phones). The outcome was a score on a grammatical test (as a percentage) that was measured both before and after the intervention. The first independent variable was, therefore, text message use (text messagers versus controls) and the second independent variable was the time at which grammatical ability was assessed (baseline or after six months). The data are in the file Text Messages.sav. To graph these data we need to follow the general procedure for graphing related means. Our repeated-measures variable is time (whether grammar ability was measured at baseline or six-months) and is represented in the data file by two columns, one for the baseline data and the other for the follow-up data. In the Chart Builder you need to select these two variables simultaneously by clicking on one and then holding down the Ctrl key on the keyboard and clicking on the other. When they are both highlighted, click on either one and drag it into . The second variable (whether children text messaged or

not) was measured using different children and so is represented in the data file by a grouping variable (group). This variable can be selected in the variable list and dragged into . The two groups will now be displayed as different coloured lines.

36

As I said, the procedure for producing line graphs is basically the same as for bar charts except that you get lines on your graphs instead of bars. Therefore, you should be able to follow the previous sections for bar charts but selecting a simple line chart instead of a simple bar chart, and selecting a multiple line chart instead of a clustered bar chart. I would like you to produce line charts of each of the bar charts in the previous section. In case you get stuck, the self-test answers that can be downloaded from the companion website will take you through it step by step.

The finished Chart Builder is below. Click on

to produce the graph.

37

Selecting the repeated-measures variable in the Chart Builder

38

Completed dialog box for an error bar graph of a mixed design

Error bar graph of the mean grammar score over six months in children who were allowed to text message versus those who were forbidden

39

The resulting line chart shows that at baseline (before the intervention) the grammar scores were comparable in our two groups; however, after the intervention, the grammar scores were lower in the text messagers than in the controls. Also, if you compare the blue line with the green line you can see that text messagers grammar scores have fallen over the six months, whereas the controls grammar scores are fairly stable over time. We could, therefore, conclude that text messaging has a detrimental effect on childrens understanding of English grammar and, therefore, civilization will crumble, and Abaddon will rise cackling from his bottomless pit to claim our wretched souls. Maybe.

Based on my minimal (and no doubt unhelpful) summary, produced a 3-D scatterplot of the data in Figure 4.37 but with the data split by gender. To make things a bit more tricky see if you can get SPSS to display different symbols for the two groups rather than two colours (see SPSS Tip 4.3). A full guided answer can be downloaded from the companion website.
To produce this graph, first double-click on the grouped 3-D scatterplot icon in the Chart Builder (see the book for how to access the Chart Builder). The graph preview on the canvas is the same as for a simple 3-D scatterplot except that our old friend the drop zone is back. First, we simply repeat what we have done for previous scatterplots; so, select Exam Performance (%) from the variable list and drag it into the zone, select Exam Anxiety and drag it into Spent Revising in the variable list and drag it into drop

drop zone, and select Time drop zone. To split the data

cloud by a categorical variable (in this case gender), we select this variable in the variable list and drag it into the drop zone. The completed dialog box is below.

40

Completed dialog box for a grouped 3-D scatterplot However, I also asked you to display the different groups as different-shaped symbols. As it stands, we have asked SPSS to produce different-coloured symbols for males and females. To change this, we need to double-click on the drop

zone to open a new dialog box that has a drop-down list in which Color will currently be selected. Click on this list to activate it and then select Pattern. Then click on to register this change. Back in the Chart Builder the to plot the graph, which

drop zone will have been renamed Set pattern. Click on should look something like the one below.

41

Doing a simple dot plot in the Chart Builder is quite similar to drawing a histogram. Reload the DownloadFestival.sav data and see if you can produce a simple dot plot of the Download day 1 hygiene scores. Compare the resulting graph with the histogram of the same data.
First, make sure that you have loaded the DownloadFestival.sav file and that you open the Chart Builder from this data file. Once you have accessed the Chart Builder (see the book chapter) select the Scatter/Dot in the chart gallery and then double-click on the icon for a simple dot plot (again, see the book chapter if youre unsure of what icon to click). The Chart Builder dialog box will now show a preview of the graph in the canvas area. At the moment its not very exciting because we havent told SPSS which variables we want to plot. Note that the variables in the data editor are listed on the left-hand side of the Chart

42

Builder, and any of these variables can be dragged into any of the spaces surrounded by blue dotted lines (called drop zones). Like a histogram, a simple dot plot plots a single variable (x-axis) against the frequency of scores (y-axis), so there is just one drop zone ( variable from the list and drag it into ). All we need to do is select a . To do a simple dot plot of the day 1 as

hygiene scores we click on this variable in the variable list and drag it to

shown below; you will now find the dot plot previewed on the canvas. To draw the dot plot click on

Click on the Hygiene Day 1 variable and drag it to this Drop Zone.

43

Defining a simple dot plot (a.k.a. density plot) in the Chart Builder

The resulting density plot is shown below along with the original histogram from the book. The first thing that should leap out at you is that they are very similar (in terms of what they show): they both tell us about the distribution of scores, and they both show us the outlier that was discussed in the chapter. These graphs, therefore, are really just two ways of showing the same thing. The density plot gives us a little more detail than the histogram but essentially they show the same thing.

44

Density plot of the Download day 1 hygiene scores and the original histogram from the book

45

Doing a drop-line plot in the Chart Builder is quite similar to drawing a clustered bar chart. Reload the ChickFlick.sav data and see if you can produce a drop-line plot of the arousal scores. Compare the resulting graph with the earlier clustered bar chart of the same data.
To do a drop-line chart for means that are independent (i.e. have come from different groups) we need to double-click on the drop-line chart icon in the Chart Builder (see the book chapter if youre not sure what this icon looks like or how to access the Chart Builder). On the canvas you will see a graph with some dots and three drop zones that are the same as for a clustered bar chart: , and . As with the

clustered bar chart example from the book, select arousal from the variable list and drag it into variable list and drag it into gender variable and drag it into the , select film from the , and select the drop zone. This will mean that the dots

representing males and females will be displayed in different colours, but if you want them displayed as different symbols then, to make this change, double-click in the drop zone to bring up a new dialog box. Within this dialog box there is a drop-down list labelled Distinguish Groups by and in this list you can select Color or Pattern. To change the default, select Pattern and then click on

to make the change. Obviously you can

switch back to displaying different groups in different colours in the same way. The completed Chart Builder is shown below; click on

to produce the graph.

46

47

Using the Chart Builder to plot a drop-line graph

The resulting drop-line graph is shown below together with the clustered bar chart from the book. Hopefully its clear that these graphs show the same information (although notice that the y-axis has been scaled differently by SPSS so that the differences between films look bigger on the drop-line graph than on the bar chart). In both graphs we can see that arousal was overall higher for Memento than Bridget Jones Diary, that men and women differed very little in their arousal during memento, and that men were more aroused during Bridget Jones Diary. The fact that arousal in males and females differed more for Bridget Jones Diary than Memento is possibly a little clearer in the drop-line graph than the bar chart, but its really down to preference.

48

Drop-line graph of mean arousal scores during two films for men and women and the original clustered bar chart from the book

49

Now see if you can produce a drop-line plot of the Text Messages.sav data from earlier in this chapter. Compare the resulting graph with the earlier clustered bar chart of the same data.
To do a drop-line graph of these data we need to follow the general procedure for graphing related means. First, in the Chart Builder you need to double-click on the icon for a dropline graph (see the book chapter for help with this if you need it). Our repeated-measures variable is time (whether grammar ability was measured at baseline or six-months) and is represented in the data file by two columns, one for the baseline data and the other for the follow-up data. Then select these two variables simultaneously by clicking on one and then holding down the Ctrl key on the keyboard and clicking on the other. When they are both highlighted, click on either one and drag it into as shown below. The second

variable (whether children text messaged or not) was measured using different children and so is represented in the data file by a grouping variable (group). This variable can be selected in the variable list and dragged into displayed as differentcoloured dots. The finished Chart Builder is shown below. Click on to produce the graph. . The two groups will now be

The resulting drop-line graph is shown together with the bar chart from the book chapter. They both show that at baseline (before the intervention) the grammar scores were comparable in our two groups. On the drop-line graph this is particularly apparent because the two dots merge into one (you cant see the drop line because the means are so similar). After the intervention, the grammar scores were lower in the text messagers than in the controls. By comparing the two vertical lines its clearer on the drop-line graph that the difference between text messagers and controls is bigger at six months than it is preintervention.

50

Selected repeated-measures variable in the Chart Builder

51

Completed dialog box for an error bar graph of a mixed design

Error bar graph of the mean grammar score over six months in children who were allowed to text message versus those who were forbidden

52

Additional Material

Oliver Twisted: Please Sir, Can I Have Some More Graphs?

As an exercise to get you using some of the graph editing facilities were going to take one of the graphs from the chapter and change some of its properties to produce a graph that follows Tuftes guidelines (i.e. minimal ink, no chartjunk and so on). Well use the graph for the arousal scores for the two films (Bridget Jones Diary and Memento). The original graph looked like this:

53

To edit this graph double-click on it in the SPSS Viewer. This will open the chart in the SPSS Chart Editor:

Double-click anywhere on the graph to open it in the SPSS Chart Editor.

54

Editing the Chart background and border


First, lets get rid of the outside border and background colour of the graph after all, its just unnecessary ink! Select the border by double-clicking on it with the mouse. It will become highlighted in blue and a properties dialog box will appear:

If you select the Fill & Border tab (as above), You can change the background and border of the chart. At the bottom there are options to change the border style, such as how thick it is (Weight), the style (full, dotted etc.) and whether the lines end round, square or butted. We can change the background and border colour by selecting a colour from the palette. To change the background colour click on and then click on any colour from the palette.

55

The square next to the word Fill will change colour. To change the border colour click on and then select a colour from the palette. Again, the square will change from black to this new colour. In this case I want us to get rid of the border and to make the background plain. Therefore, for both I want you to select no colour and this is represented by click on disappear. . So, click on , then click on , then click on , and then

. To apply these changes click on

; the border and background should

Editing the Axes


Now, lets get rid of the axis lines theyre just unnecessary ink too! Select the y-axis by double-clicking on it with the mouse. It will become highlighted in blue and a properties dialog box will appear mush the same as before. This properties dialog box has many tabs that allow us to change aspects of the y-axis. Well look at some of these in turn.

56

The Scale tab. This tab allows us to change the minimum, maximum and increments on the scale. Currently our graph is scaled from 0 to 40 and has a tick every 10 units (the major increment is, therefore, 10). However, there is a lot of space at the top of the graph. First switch off all of the autos and then change the Maximum from 40 to 35. In doing so we cannot have major increments of 10 (because 10 does not divide into 35 without a remainder). So, we need to change the Major Increment to a value that does divide into 35. Lets use 5. To make sure that SPSS doesnt rescale the minimum (it will do), also deselect auto for Minimum and make sure the value is set as 0. Click on should change in the Chart Editor. and the scale of the y-axis

The Number Format tab allows us to change the number format used on the y-axis. The default is to have 2 decimal places, but because all of our ticks appear at values of whole numbers (0, 5, 10, 15, etc.) these decimal places are redundant. If we change the Decimal Places to 0 (see left) then we can get rid of these superfluous decimal places. Click on and the decimal places on the y-axis should vanish

in the Chart Editor.

57

The Lines tab allows us to change the properties of the axis itself. We dont really need to have a line there at all, so lets get rid of it in the same way as we did for the background border. Click on and then click on . Click on

and the y-axis line should vanish in the Chart Editor.

The Labels & Ticks tab allows us to change various aspects of the ticks on the axis. The major increment ticks are shown by default (you should leave them there), and labels for them (the numbers) are shown by default also. These numbers are important, so leave the defaults alone. You could choose to display minor ticks. Lets do this. Ask it to display the minor ticks. We have major ticks every 5, so it might be useful to have a minor tick every 1. To do this we need to set Number of minor ticks per major tick to be 4 (see left).

Lets now edit the x-axis. To do this double-click on it in the Chart Editor. The axis will become highlighted in blue and the Properties dialog box will open. Some of the properties tabs are the same as for the x-axis so well just look at the ones that differ. Using what you have learnt already, set the line colour to be transparent so this it disappears (see the Lines tab above).

58

The Categories tab allows us to change the order of categories on this axis.

59

The Variables tab allows us to change properties of the variables. For one thing if you dont want a bar chart then there is a

drop-down list of alternatives from which you can choose. Also we have gender displayed by different colours, but we can change this so that genders are differentiated by other style differences (such as a pattern). See the drop-down list (left).

Editing the Bars


To edit the bars double-click on any of the bars to select them. They will become highlighted with a blue line. Lets first change the colour of the blue bars. To do this we first need to click once on the blue bars. Now instead of all of the bars being highlighted in blue, only the blue bars will be (see below). We can then use the Properties dialog box to change features of these bars.

60

61

The Depth & Angle tab allows us to change whether the bars have a drop shadow or a 3-D effect. As I tried to stress in the book, you shouldnt add this kind of chartjunk so you should leave your bars as flat. However, in case you want to ignore my advice, this is how you add chartjunk!

The Bar Options tab allows us to change the width of the bars (the default is to have bars within a cluster touching, but if you reduce the bar width below 100% then a gap will appear). You can also alter the gap between clusters. The default allows a small gap between clusters (which is sensible) but you can reduce the gap by increasing the value up to 100% (no gap between clusters) or less (a gap between clusters). You can also select whether the bars are displayed as bars (the default) or if you want them to appear as a line (Whiskers) or a

T-bar (T-bar). This kind of graph really looks best if you leave the bars as bars (otherwise the error bars look silly).

62

The Fill & Border tab allows us to change the colour of the bar and the style and colour of the bars border. I want this bar to be black, so select the colour black from the palette and then click on (see left). Click on and the blue bars

should turn black.

Now we will change the colour of the green bars. To do this we first need to click once on the green bars. Now only the blue bars will be highlighted in blue (see below). We can then use the Properties dialog boxes described above to change the colour of these bars. I want you to colour these bars grey.

63

64

Adding Grid lines


You can add grid lines to a graph simply by clicking on in the Chart Editor. If you do

this you will see that, by default, SPSS adds some pretty hideous-looking lines to your graph:

65

First off, we dont really want grid lines on our x-axis (the vertical grid lines), so lets get rid of them. To do this select them so that they are highlighted in blue in the Chart Editor:

66

Then in the Properties dialog box we change the colour of these lines to be transparent (as we have done with the axis lines above):

Click on

and the vertical grid lines should vanish from the Chart Editor.

67

Now lets edit the horizontal grid lines. To do this click on any one of the horizontal grid lines in the Chart Editor so that they become highlighted in blue:

In the Properties dialog box select the Lines tab. You could change the grid lines to be dotted by selecting a dotted line from the Style drop-down list, but dont, leave them as solid:

68

Next, lets make the grid lines a bit thicker by selecting 1.5 from the Weight drop-down list:

Finally, lets change their colour from black to white. Weve used the colour palette a few times now so you should be able to do this without any help (just click on the white square):

69

Click on

and the horizontal grid lines should become dotted, white and thicker.

Speaking of thick, youve probably noticed that you can no longer see them because we changed the colour to white and they are displayed on a white background. Youre probably also thinking that I must be some kind of idiot for telling you to do that. Youre probably right, but bear with me there is method to the madness inside my rotting breadcrumb of a brain.

Changing the Order of Elements of a Graph


Weve got white grid lines and we cant see them. Thats a bit pointless isnt it? However, we would be able to see them if they were in front of the bars. We can make this happen by again selecting the horizontal grid lines so that they are highlighted in blue; then if we click on one of them with the right mouse button a menu appears on which we can select Bring to Front. Select this option and, wow, the grid lines become visible on the bars themselves: pretty cool, I think youll agree.

70

However, we still have a problem in that our error bars can be seen on top of the grey bars but not on top of the black bars. This looks a bit odd; it would be better if we could see them only poking out of the top on both bars. To do this, click on one of the error bars so that they become highlighted in blue. Then if we click on one of them with the right mouse button a menu appears on which we can select Send to Back. Select this option and the error bas move behind the bars (therefore we can only see the top half).

71

Saving a Chart template


Youve done all of this hard work. What if you want to produce a similar-looking graph? Well, you can save these settings as a template. A template is just a file that contains a set of instructions telling SPSS how to format a graph (e.g. you want grid lines, you want the axes to be transparent, you want the bars to be coloured black and grey, and so on). To do this, in the Chart Editor go to the File menu and select Save Chart Template. You will get a dialog box, and you should select what parts of the formatting you want to save (and add a description also). Although it is tempting to just click on save all, this isnt wise because, for example, when we rescaled the y-axis we asked for a range of 035, and this is unlikely to be a sensible range for other graphs, so this is one aspect of the formatting that we would not want to save.

72

Click on

and then type a name for your template (Ive chosen Tufte Bar.sgt). By

default SPSS saves the templates in a folder called Looks, but you can save it elsewhere if you like. Assuming you have saved a chart template, you can apply it when you run a new graph in the Chart Editor by opening the Options dialog box, clicking on browsing your computer for your template file: and then

73

Chapter 5

Self-Test Answers

Using what you learnt before, plot histograms for the hygiene scores for the three days of the Download festival.
First, access the Chart Builder as in Chapter 4 of the book and then select Histogram in the list labelled Choose from to bring up the gallery, which has four icons representing different types of histogram. We want to do a simple histogram so double-click on the icon for a

74

simple histogram. The Chart Builder dialog box will now show a preview of the graph in the canvas area. To plot the histogram of the day 1 hygiene scores select the Hygiene day 1 variable from the list and drag it into the drop zone. You will now find the

histogram previewed on the canvas (see below). To draw the histogram click on

Click on the Hygiene day 1 variable and drag it to this drop zone.

The resulting histogram is shown and explained in the book.

75

To plot the day 2 scores go back to the Chart Builder but this time select the Hygiene day 2 variable from the list and drag it into the drop zone and click on

To plot the day 3 scores go back to the Chart Builder but this time select the Hygiene day 3 variable from the list and drag it into the drop zone and click on

76

Calculate and interpret the z-scores for skewness of the other variables (computer literacy and percentage of lectures attended).

For computer literacy, the z-score of skewness is 0.174/0.241 = 0.72, which is non-significant, p < .05, because it lies between 1.96 and 1.96. For lectures attended, the z-score of skewness is 0.422/0.241 = 1.75, which is non-significant, p < .05, because it lies between 1.96 and 1.96.

Calculate and interpret the z-scores for kurtosis of all of the variables.
For SPSS exam scores, the z-score of kurtosis is 1.105/0.478 = 2.31, which is significant, p < .05, because it lies outside 1.96 and 1.96. For computer literacy, the z-score of kurtosis is 0.364/0.478 = 0.76, which is non-significant, p < .05, because it lies between 1.96 and 1.96. For lectures attended, the z-score of kurtosis is 0.179/0.478 = 0.37, which is non-significant, p < .05, because it lies between 1.96 and 1.96. For numeracy, the z-score of kurtosis is 0.946/0.478 = 1.98, which is significant, p < .05, because it lies outside 1.96 and 1.96.

Repeat these analyses for the computer literacy and percentage of lectures attended and interpret the results.

77

The SPSS output is split into two sections: first, the results for students at Duncetown University, then the results for those attending Sussex

University. From these tables it is clear that Sussex and Duncetown students scored similarly on computer literacy (both means are very similar). Sussex students attended slightly more lectures (63.27%) than their Duncetown counterparts (56.26%). The histograms are also split according to the university attended. All of the distributions look fairly normal. The only exception is the computer literacy scores for the Sussex students. This is a frilly flat distribution apart from a huge peak between 50 and 60%. Its slightly heavy tailed (right at the very ends of the curve the bars come above the line) and very pointy. This suggests positive kurtosis. If you examine the values of kurtosis you will find that there is significant (p < .05) positive kurtosis: 1.38/0.662 = 2.08, which falls of 1.96 and 1.96. outside

Duncetown University

Sussex University

78

Computer Literacy

Percentage of Lectures Attended

79

Use the explore command to see what effect a natural log transformation would have on the four variables measured in SPSSExam.sav.
The completed dialog boxes should look like this:

The SPSS output below shows Levenes test on the log-transformed scores. Compare this table to the one in the book (which was conducted on the untransformed SPSS exam scores and numeracy). To recap the book chapter, for the untransformed scores Levenes test was

80

nonsignificant for the SPSS exam scores (the value in the column labelled Sig. was .111, more than .05) indicating that the variances were not significantly different (i.e. the homogeneity of variance assumption is tenable). However, for the numeracy scores, Levenes test was significant (the value in the column labelled Sig. was .008, less than .05) indicating that the variances were significantly different (i.e. the homogeneity of variance assumption was violated). For the log-transformed scores (below), the problem has been reversed: Levenes test is now significant for the SPSS exam scores (values in the column labelled Sig. are less than .05) but is no longer significant for the numeracy scores (values in the column labelled Sig. are more than .05). This re-iterates my point from the book chapter that transformations are often not a magic solution to problems in the data!

81

Have a go at creating similar variables logday2 and logday3 for the day 2 and day 3 data. Plot histograms of the transformed scores for all three days.
The completed Compute Variable dialog boxes for day2 and day 3 should look as below:

82

The histograms for days 1 and 2 are in the book, but for day 3 the histogram should look like this:

Repeat this process for day2 and day3 to create variables called
sqrtday2 and sqrtday3. Plot histograms of the transformed scores

for all three days.


The completed Compute Variable dialog boxes for day2 and day 3 should look as below:

83

84

The histograms for days 1 and 2 are in the book, but for day 3 the histogram should look like this:

Repeat this process for day2 and day3. Plot histograms of the transformed scores for all three days.
The completed Compute Variable dialog boxes for day2 and day 3 should look as below:

85

86

The histograms for days 1 and 2 are in the book, but for day 3 the histogram should look like this:

Additional Material
Oliver Twisted: Please Sir, Can I Have Some More Frequencies?

In your SPSS output you will also see tabulated frequency distributions of each variable. These tables are reproduced in the additional online material along with a description.

87

In your SPSS output you will also see tabulated frequency distributions of each variable (below). These tables list each score and the number of times that it is found within the data set. In addition, each frequency value is expressed as a percentage of the sample. Also, the cumulative percentage is given, which tells us how many cases (as a percentage) fell below a certain score. So, for example, we can see that only 15.4% of hygiene scores were below 1 on the first day of the festival. Compare this to the table for day 2: 63.3% of scores were less than 1!

Hygiene (Day 1 of Download Festival) Valid Frequency Percent Valid 0.02 0.05 0.11 0.23 0.26 0.29 0.3 0.32 0.35 1 1 2 2 1 1 1 4 1 .1 .1 .2 .2 .1 .1 .1 .5 .1 Percent .1 .1 .2 .2 .1 .1 .1 .5 .1 Cumulative Percent .1 .2 .5 .7 .9 1.0 1.1 1.6 1.7

88

0.38 0.43 0.44 0.45 0.47 0.5 0.51 0.52 0.55 0.58 0.59 0.6 0.61 0.62 0.64 0.67 0.7 0.73

3 1 1 2 3 3 1 5 4 3 1 1 5 1 3 6 3 6

.4 .1 .1 .2 .4 .4 .1 .6 .5 .4 .1 .1 .6 .1 .4 .7 .4 .7

.4 .1 .1 .2 .4 .4 .1 .6 .5 .4 .1 .1 .6 .1 .4 .7 .4 .7

2.1 2.2 2.3 2.6 3.0 3.3 3.5 4.1 4.6 4.9 5.1 5.2 5.8 5.9 6.3 7.0 7.4 8.1

89

0.76 0.78 0.79 0.81 0.82 0.83 0.84 0.85 0.88 0.9 0.91 0.93 0.94 0.96 0.97 1 1.02 1.03

3 1 1 1 6 1 2 5 6 2 2 1 6 2 7 13 6 1

.4 .1 .1 .1 .7 .1 .2 .6 .7 .2 .2 .1 .7 .2 .9 1.6 .7 .1

.4 .1 .1 .1 .7 .1 .2 .6 .7 .2 .2 .1 .7 .2 .9 1.6 .7 .1

8.5 8.6 8.8 8.9 9.6 9.8 10.0 10.6 11.4 11.6 11.9 12.0 12.7 13.0 13.8 15.4 16.2 16.3

90

1.05 1.06 1.08 1.11 1.14 1.15 1.17 1.2 1.21 1.23 1.24 1.26 1.28 1.29 1.31 1.32 1.33 1.34

5 5 7 5 12 1 5 5 1 9 1 6 2 6 1 8 1 3

.6 .6 .9 .6 1.5 .1 .6 .6 .1 1.1 .1 .7 .2 .7 .1 1.0 .1 .4

.6 .6 .9 .6 1.5 .1 .6 .6 .1 1.1 .1 .7 .2 .7 .1 1.0 .1 .4

16.9 17.5 18.4 19.0 20.5 20.6 21.2 21.9 22.0 23.1 23.2 24.0 24.2 24.9 25.1 26.0 26.2 26.5

91

1.35 1.38 1.41 1.42 1.44 1.45 1.47 1.48 1.5 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59

9 7 11 4 11 1 17 1 14 2 11 1 1 7 2 2 15 1

1.1 .9 1.4 .5 1.4 .1 2.1 .1 1.7 .2 1.4 .1 .1 .9 .2 .2 1.9 .1

1.1 .9 1.4 .5 1.4 .1 2.1 .1 1.7 .2 1.4 .1 .1 .9 .2 .2 1.9 .1

27.7 28.5 29.9 30.4 31.7 31.9 34.0 34.1 35.8 36.0 37.4 37.5 37.7 38.5 38.8 39.0 40.9 41.0

92

1.6 1.61 1.64 1.66 1.67 1.68 1.69 1.7 1.71 1.73 1.75 1.76 1.77 1.78 1.79 1.81 1.82 1.84

3 7 13 4 11 1 1 7 1 11 1 5 1 2 10 2 12 2

.4 .9 1.6 .5 1.4 .1 .1 .9 .1 1.4 .1 .6 .1 .2 1.2 .2 1.5 .2

.4 .9 1.6 .5 1.4 .1 .1 .9 .1 1.4 .1 .6 .1 .2 1.2 .2 1.5 .2

41.4 42.2 43.8 44.3 45.7 45.8 45.9 46.8 46.9 48.3 48.4 49.0 49.1 49.4 50.6 50.9 52.3 52.6

93

1.85 1.87 1.88 1.9 1.91 1.93 1.94 1.96 1.97 2 2.02 2.03 2.05 2.06 2.08 2.09 2.11 2.12

14 2 5 3 11 4 14 1 6 19 16 1 14 1 10 2 4 2

1.7 .2 .6 .4 1.4 .5 1.7 .1 .7 2.3 2.0 .1 1.7 .1 1.2 .2 .5 .2

1.7 .2 .6 .4 1.4 .5 1.7 .1 .7 2.3 2.0 .1 1.7 .1 1.2 .2 .5 .2

54.3 54.6 55.2 55.6 56.9 57.4 59.1 59.3 60.0 62.3 64.3 64.4 66.2 66.3 67.5 67.8 68.3 68.5

94

2.14 2.16 2.17 2.18 2.2 2.21 2.22 2.23 2.24 2.26 2.27 2.28 2.29 2.3 2.31 2.32 2.33 2.34

8 1 15 3 10 2 1 14 1 6 1 1 12 3 1 10 1 1

1.0 .1 1.9 .4 1.2 .2 .1 1.7 .1 .7 .1 .1 1.5 .4 .1 1.2 .1 .1

1.0 .1 1.9 .4 1.2 .2 .1 1.7 .1 .7 .1 .1 1.5 .4 .1 1.2 .1 .1

69.5 69.6 71.5 71.9 73.1 73.3 73.5 75.2 75.3 76.0 76.2 76.3 77.8 78.1 78.3 79.5 79.6 79.8

95

2.35 2.36 2.38 2.39 2.41 2.42 2.44 2.45 2.46 2.47 2.48 2.5 2.51 2.52 2.53 2.55 2.56 2.57

5 2 2 2 3 1 8 2 2 4 1 10 2 9 1 4 2 2

.6 .2 .2 .2 .4 .1 1.0 .2 .2 .5 .1 1.2 .2 1.1 .1 .5 .2 .2

.6 .2 .2 .2 .4 .1 1.0 .2 .2 .5 .1 1.2 .2 1.1 .1 .5 .2 .2

80.4 80.6 80.9 81.1 81.5 81.6 82.6 82.8 83.1 83.6 83.7 84.9 85.2 86.3 86.4 86.9 87.2 87.4

96

2.58 2.61 2.62 2.63 2.64 2.66 2.67 2.7 2.71 2.73 2.75 2.76 2.78 2.79 2.81 2.82 2.84 2.85

6 3 1 3 5 1 5 4 1 5 1 4 1 2 4 2 2 1

.7 .4 .1 .4 .6 .1 .6 .5 .1 .6 .1 .5 .1 .2 .5 .2 .2 .1

.7 .4 .1 .4 .6 .1 .6 .5 .1 .6 .1 .5 .1 .2 .5 .2 .2 .1

88.1 88.5 88.6 89.0 89.6 89.8 90.4 90.9 91.0 91.6 91.7 92.2 92.3 92.6 93.1 93.3 93.6 93.7

97

2.87 2.88 2.9 2.91 2.92 2.94 2.97 3 3.02 3.03 3.08 3.09 3.11 3.12 3.14 3.15 3.17 3.2

1 8 1 3 1 3 3 2 2 1 1 1 1 1 1 2 1 2

.1 1.0 .1 .4 .1 .4 .4 .2 .2 .1 .1 .1 .1 .1 .1 .2 .1 .2

.1 1.0 .1 .4 .1 .4 .4 .2 .2 .1 .1 .1 .1 .1 .1 .2 .1 .2

93.8 94.8 94.9 95.3 95.4 95.8 96.2 96.4 96.7 96.8 96.9 97.0 97.2 97.3 97.4 97.7 97.8 98.0

98

3.21 3.23 3.26 3.29 3.32 3.38 3.41 3.44 3.58 3.69 Total

3 1 1 2 3 2 1 1 1 1 810

.4 .1 .1 .2 .4 .2 .1 .1 .1 .1 100.0

.4 .1 .1 .2 .4 .2 .1 .1 .1 .1 100.0

98.4 98.5 98.6 98.9 99.3 99.5 99.6 99.8 99.9 100.0

Hygiene (Day 2 of Download Festival) Valid Frequency Percent Valid 0 0.02 1 2 .1 .2 Percent .4 .8 Cumulative Percent .4 1.1

99

0.05 0.06 0.08 0.11 0.14 0.17 0.2 0.23 0.26 0.28 0.29 0.32 0.35 0.38 0.41 0.44 0.45 0.47

1 1 2 3 8 6 8 10 5 1 2 5 4 6 4 5 1 4

.1 .1 .2 .4 1.0 .7 1.0 1.2 .6 .1 .2 .6 .5 .7 .5 .6 .1 .5

.4 .4 .8 1.1 3.0 2.3 3.0 3.8 1.9 .4 .8 1.9 1.5 2.3 1.5 1.9 .4 1.5

1.5 1.9 2.7 3.8 6.8 9.1 12.1 15.9 17.8 18.2 18.9 20.8 22.3 24.6 26.1 28.0 28.4 29.9

100

0.48 0.5 0.52 0.55 0.56 0.58 0.64 0.67 0.7 0.73 0.76 0.78 0.79 0.82 0.84 0.85 0.88 0.9

1 2 5 5 1 7 4 3 7 2 9 1 7 4 2 8 1 1

.1 .2 .6 .6 .1 .9 .5 .4 .9 .2 1.1 .1 .9 .5 .2 1.0 .1 .1

.4 .8 1.9 1.9 .4 2.7 1.5 1.1 2.7 .8 3.4 .4 2.7 1.5 .8 3.0 .4 .4

30.3 31.1 33.0 34.8 35.2 37.9 39.4 40.5 43.2 43.9 47.3 47.7 50.4 51.9 52.7 55.7 56.1 56.4

101

0.91 0.94 0.97 1 1.02 1.05 1.06 1.08 1.11 1.13 1.14 1.17 1.18 1.2 1.21 1.23 1.29 1.32

6 6 1 5 4 1 1 2 5 1 5 3 1 2 1 1 1 2

.7 .7 .1 .6 .5 .1 .1 .2 .6 .1 .6 .4 .1 .2 .1 .1 .1 .2

2.3 2.3 .4 1.9 1.5 .4 .4 .8 1.9 .4 1.9 1.1 .4 .8 .4 .4 .4 .8

58.7 61.0 61.4 63.3 64.8 65.2 65.5 66.3 68.2 68.6 70.5 71.6 72.0 72.7 73.1 73.5 73.9 74.6

102

1.35 1.38 1.41 1.44 1.45 1.5 1.52 1.54 1.55 1.58 1.64 1.7 1.73 1.75 1.76 1.78 1.79 1.82

4 3 2 3 1 1 2 1 1 3 2 4 1 1 1 1 1 1

.5 .4 .2 .4 .1 .1 .2 .1 .1 .4 .2 .5 .1 .1 .1 .1 .1 .1

1.5 1.1 .8 1.1 .4 .4 .8 .4 .4 1.1 .8 1.5 .4 .4 .4 .4 .4 .4

76.1 77.3 78.0 79.2 79.5 79.9 80.7 81.1 81.4 82.6 83.3 84.8 85.2 85.6 86.0 86.4 86.7 87.1

103

1.87 1.88 1.9 1.94 1.97 2.05 2.08 2.12 2.2 2.23 2.29 2.32 2.38 2.41 2.42 2.44 2.5 2.53

1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 1

.1 .1 .1 .2 .2 .2 .2 .1 .1 .1 .1 .1 .1 .1 .1 .2 .2 .1

.4 .4 .4 .8 .8 .8 .8 .4 .4 .4 .4 .4 .4 .4 .4 .8 .8 .4

87.5 87.9 88.3 89.0 89.8 90.5 91.3 91.7 92.0 92.4 92.8 93.2 93.6 93.9 94.3 95.1 95.8 96.2

104

2.55 2.61 2.7 2.72 2.85 2.91 3 3.21 3.35 3.44 Total Missing System Total

1 1 1 1 1 1 1 1 1 1 264 546 810

.1 .1 .1 .1 .1 .1 .1 .1 .1 .1 32.6 67.4 100.0

.4 .4 .4 .4 .4 .4 .4 .4 .4 .4 100.0

96.6 97.0 97.3 97.7 98.1 98.5 98.9 99.2 99.6 100.0

Hygiene (Day 3 of Download Festival) Valid Frequency Percent Valid 0.02 2 .2 Percent 1.6 Cumulative Percent 1.6

105

0.08 0.11 0.14 0.17 0.2 0.26 0.29 0.32 0.33 0.35 0.38 0.39 0.41 0.44 0.45 0.47 0.5 0.52

1 1 2 3 3 1 3 1 2 2 5 1 1 6 1 5 1 4

.1 .1 .2 .4 .4 .1 .4 .1 .2 .2 .6 .1 .1 .7 .1 .6 .1 .5

.8 .8 1.6 2.4 2.4 .8 2.4 .8 1.6 1.6 4.1 .8 .8 4.9 .8 4.1 .8 3.3

2.4 3.3 4.9 7.3 9.8 10.6 13.0 13.8 15.4 17.1 21.1 22.0 22.8 27.6 28.5 32.5 33.3 36.6

106

0.53 0.55 0.58 0.61 0.67 0.7 0.72 0.73 0.76 0.81 0.82 0.85 0.88 0.91 0.94 0.96 1.02 1.17

1 3 2 1 1 3 1 1 6 1 1 1 1 5 2 1 4 1

.1 .4 .2 .1 .1 .4 .1 .1 .7 .1 .1 .1 .1 .6 .2 .1 .5 .1

.8 2.4 1.6 .8 .8 2.4 .8 .8 4.9 .8 .8 .8 .8 4.1 1.6 .8 3.3 .8

37.4 39.8 41.5 42.3 43.1 45.5 46.3 47.2 52.0 52.8 53.7 54.5 55.3 59.3 61.0 61.8 65.0 65.9

107

1.18 1.19 1.2 1.26 1.29 1.32 1.38 1.44 1.5 1.55 1.58 1.61 1.66 1.67 1.7 1.73 1.76 1.85

1 1 2 1 1 1 1 1 2 1 2 1 1 3 3 2 2 1

.1 .1 .2 .1 .1 .1 .1 .1 .2 .1 .2 .1 .1 .4 .4 .2 .2 .1

.8 .8 1.6 .8 .8 .8 .8 .8 1.6 .8 1.6 .8 .8 2.4 2.4 1.6 1.6 .8

66.7 67.5 69.1 69.9 70.7 71.5 72.4 73.2 74.8 75.6 77.2 78.0 78.9 81.3 83.7 85.4 87.0 87.8

108

1.88 1.91 2 2.11 2.15 2.29 2.55 2.7 3.02 3.41 Total Missing System Total

2 3 1 2 1 1 1 1 2 1 123 687 810

.2 .4 .1 .2 .1 .1 .1 .1 .2 .1 15.2 84.8 100.0

1.6 2.4 .8 1.6 .8 .8 .8 .8 1.6 .8 100.0

89.4 91.9 92.7 94.3 95.1 95.9 96.7 97.6 99.2 100.0

Oliver Twisted: Please Sir, can I have some more normality tests?

109

The observant among you will see that there is another test reported in the table (the Shapiro Wilk test). The even more eagle-eyed will also notice a footnote to the KS test saying that Lilliefors significance correction has been applied. (You might find this especially confusing if youve ever done the KS test through the nonparametric test menu rather than the explore menu because this correction is not applied.) What the hell is going on? In the additional material for this chapter on the companion website you can find out more about the KS test, some information about the Lilliefors correction and the ShapiroWilk test. What are you waiting for?

If you want to test whether a model is a good fit of your data you can use a goodness-of-fit test (you can read about these in the chapter on categorical data analysis in the book), which has a chi-square test statistic (with the associated distribution). One problem with this test is that it needs a certain sample size to be accurate. The KS test was developed as a test of whether a distribution of scores matches a hypothesized distribution (Massey, 1951). One good thing about the test is that the distribution of the KS test statistic does not depend on the hypothesized distribution (in other words, the hypothesized distribution doesnt have to be a particular distribution). It is also what is known as an exact test, which means that it can be used on small samples. It also appears to have more power to detect deviations from the hypothesized distribution than the chi-square test (Lilliefors, 1967). However, one major limitation of the

KS test is that if location (i.e. the mean) and shape parameters (i.e. the standard deviation) are estimated from the data then the KS test is very conservative, which

110

means it fails to detect deviations from the distribution of interest (i.e. normal). What Lilliefors did was to adjust the critical values for significance for the KS test to make it less conservative (Lilliefors, 1967) using Monte Carlo simulations (these new values were about twothirds the size of the standard values). He also reported that this test was more powerful than a standard chi-square test (and obviously the standard KS test). Another test youll use to test normality is the ShapiroWilk test (Shapiro & Wilk, 1965) which was developed specifically to test whether a distribution is normal (whereas the KS test can be used to test against other distributions than normal). They concluded that their test was comparatively quite sensitive to a wide range of non-normality, even with samples as small as n = 20. It seems to be especially sensitive to asymmetry, long-tailedness and to some degree to short-tailedness. (p. 608). To test the power of these tests they applied them to several samples (n = 20) from various non-normal distributions. In each case they took 500 samples which allowed them to see how many times (in 500) the test correctly identified a deviation from normality (this is the power of the test). They show in these simulations (see table 7 in their paper) that the SW test is considerably more powerful to detect deviations from normality than the KS test. They verified this general conclusion in a much more extensive set of simulations as well (Shapiro, Wilk, & Chen, 1968).

Oliver Twisted: Please Sir, Can I Have Some More Hartleys FMax?

Oliver thinks that my graph of critical values is stupid. Look at that graph, he laughed. Its the most stupid thing Ive ever seen since I was at Sussex Uni and I

111

saw my statistics lecturer, Andy Fie. Well, go choke on your gruel you Dickensian bubo because the full table of critical values is in the additional material for this chapter on the companion website. Critical values for Hartleys test ( = .05).
(n - 1) per group 2 3 4 5 6 7 8 9 10 12 15 20 30 60 Num ber of Variances Com pared 2 39.00 15.40 9.60 7.15 5.82 4.99 4.43 4.03 3.72 3.28 2.86 2.46 2.07 1.67 1.00 3 87.50 27.80 15.50 10.80 8.38 6.94 6.00 5.34 4.85 4.16 3.54 2.95 2.40 1.85 1.00 4 142.00 39.20 20.60 13.70 10.40 8.44 7.18 6.31 5.67 4.79 4.01 3.29 2.61 1.96 1.00 5 202.00 50.70 25.20 16.30 12.10 9.70 8.12 7.11 6.34 5.30 4.37 3.54 2.78 2.04 1.00 6 62.00 29.50 18.70 13.70 10.80 9.03 7.80 6.92 5.72 4.68 3.76 2.91 2.11 1.00 7 72.90 33.60 20.80 15.00 11.80 9.80 8.41 7.42 6.09 4.95 3.94 3.02 2.17 1.00 8 403.00 83.50 37.50 22.90 16.30 12.70 10.50 8.95 7.87 6.42 5.19 4.10 3.12 2.22 1.00 9 475.00 93.90 41.40 24.70 17.50 13.50 11.10 9.45 8.28 6.72 5.40 4.24 3.21 2.26 1.00 10 550.00 104.00 44.60 26.50 18.60 14.30 11.70 9.91 8.66 7.00 5.59 4.37 3.29 2.30 1.00 11 626.00 114.00 48.00 28.20 19.70 15.10 12.20 10.30 9.01 7.25 5.77 4.49 3.36 2.33 1.00 12 704.00 124.00 51.40 29.90 20.70 15.80 12.70 10.70 9.34 7.48 5.93 4.59 3.39 2.36 1.00 266.00 333.00

Chapter 7

Self-Test Answers

How is the t in SPSS Output 7.3 calculated? Use the values in the table to see if you can get the same value as SPSS.
It is calculated using this equation:

t= =

bobserved bexpected SE b bobserved SE b


112

Using the values from SPSS Output 7.3 to calculate t for the constant (t = 134.140/7.537 = 17.79), for the advertising budget, we get: 0.096/0.01 = 9.6. This value is different to the one in the output (t = 9.979) because SPSS rounds values in the output to 3 decimal places, but calculates t using unrounded values (usually this doesnt make too much difference but in this case it does!). In this case the rounding has had quite an effect on the standard error (its value is 0.009632 but it has been rounded to 0.01). To obtain the unrounded values, double-click the table in the SPSS output and then double-click the value that you wish to see in full. You should find that t = 0.096124/0.009632 = 9.979.

How many records would be sold if we spent 666,000 on advertising his

latest CD by black metal band Abgott?


He would sell 198,080 CDs:

Record

Sales

= 134 . 14 + (0 . 096 666 = 198 . 08

= 134 . 14 + (0 . 096 Advertisin

g Budget

Additional Material

Labcoat Leni's Real Research: Why do you like your lecturers?

Chamorro-Premuzic, T., et al. (2008). Personality and Individual Differences, 44, 965976.

In the previous chapter we encountered a study by Chamorro-Premuzic et al. in which they measured students personality characteristics and asked them to rate how much they

113

wanted these same characteristics in their lecturers (see for a full description). In the last chapter we correlated these scores; however, we could go a step further and see whether students personality characteristics predict the characteristics that they would like to see in their lecturers. The data from this study are in the file Chamorro-Premuzic.sav. Labcoat Leni wants you to carry out five multiple regression analyses: the outcome variables in each of the five analyses are the ratings of how much students want to see neuroticism, extroversion, openness to experience, agreeableness and conscientiousness. For each of these outcomes, force Age and Gender into the analysis in the first step of the hierarchy, then in the second block force in the five student personality traits (Neuroticism, Extroversion, Openness to experience, Agreeableness and Conscientiousness). For each analysis create a table of the results. Answers are in the additional material for this website (or look at Table 4 in the original article).

Lecturer Neuroticism
The first regression well do is whether students want lecturers to be neurotic. Define the two blocks as follows. In the first block put Age and Gender:

114

In the second, put all of the student personality variables (five variables in all):

Set the options as in the book chapter. The main output (I havent reproduced it all, but you can find it in the file Charmorro-Premuzic.spv), is as follows:

115

116

You could report these results as: B Step 1 Constant Age Gender 28.22 0.28 2.42 2.59 0.13 1.02 .11* .12* SE B

117

Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness 16.77 0.30 1.90 0.06 0.12 0.17 0.09 0.20 5.30 0.13 1.09 0.06 0.08 0.07 0.07 0.08 .12* .10 .06 .08 .12* .07 .16*

Note:. R2 = .03 for step 1: R2 = .04 for step 2 (p < .05). * p < .05. So basically, age, openness and conscientiousness were significant predictors of wanting a neurotic lecturer (note that for openness and conscientiousness the relationship is negative, i.e. the more a student scored on these characteristics, the less they wanted a neurotic lecturer).

Lecturer Extroversion
The second variable we want to predict is lecturer extroversion. I wont run through the analysis and output but you can find it in the file Charmorro-Premuzic.spv. You could report these results as: B Step 1 Constant 12.13 2.43 SE B

118

Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness

.03 .93

.12 .94

.01 .06

3.62 .02 1.31 .00 .15 .04 .00 .10

4.93 .12 1.00 .06 .07 .07 .07 .08 .01 .08 .01 .14* .03 .00 .10

Note: R2 = .00 for step 1: R2 = .03 for step 2 (p > .05). * p < .05. So basically, student extroversion was the only significant predictor of wanting an extrovert lecturer; the model overall did not explain a significant amount of the variance in wanting an extroverted lecturer.

Lecturer Openness to Experience


The third variable we want to predict is lecturer openness to experience. As before, the SPSS output can be found in the file Charmorro-Premuzic.spv. You could report these results as: B Step 1 SE B

119

Constant Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness

9.41 .04 .23

2.37 .12 .92 .02 .01

5.16 .05 .09 .01 .07 .26 .14 .03

4.75 .12 .96 .05 .07 .07 .06 .07 .02 .01 .01 .05 .20*** .12* .03

Note:. R2 = .00 for step 1 (ns): R2 = .06 for step 2 (p < .001). * p < .05, *** p < .001. So basically, student openness to experience was the most significant predictor of wanting a lecturer who is open to experiences, but student agreeableness predicted this also.

Lecturer Agreeableness
The fourth variable we want to predict is lecturer agreeableness. As before, the SPSS output can be found in the file Charmorro-Premuzic.spv. You could report these results as: B Step 1 SE B

120

Constant Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness

18.30 .47 .83

2.77 .14 1.07 .17 .04

8.76 .47 .78 .14 .05 .22 .14 .14

5.51 .14 1.11 .06 .08 .08 .07 .09 .17** .04 .13* .03 .14** .11 .10

Note:. R2 = .03 for step 1 (p < .01): R2 = .06 for step 2 (p < .001). * p < .05, ** p < .01. Age, student openness to experience and student neuroticism significantly predicted wanting a lecturer who is agreeable. Age and openness to experience had negative relationships (the older and more open to experienced you are, the less you want an agreeable lecturer), whereas as student neuroticism increases so does the desire for an agreeable lecturer (not surprisingly, because neurotics will lack confidence and probably feel more able to ask an agreeable lecturer questions).

Lecturer Conscientiousness
The final variable we want to predict is lecturer conscientiousness. As before, the SPSS output can be found in the file Charmorro-Premuzic.spv.

121

You could report these results as: B Step 1 Constant Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness 5.85 .14 1.65 .01 .06 .01 .12 .16 4.50 .11 .91 .05 .07 .06 .06 .07 .06 .10 .01 .05 .01 .12* .14* 13.84 .16 2.33 2.24 .11 .87 .07 .14** SE B

Note: R2 = .02 for step 1 (p < .05): R2 = .05 for step 2 (p < .01). * p < .05, ** p < .01. Student agreeableness and conscientiousness both predicted wanting a lecturer who is conscientious. Note also that gender predicted this in the first step, but its b became slightly non-significant (p = .07) when the student personality variables were forced in as well. However, gender is probably a variable that should be explored further within this context.

122

Compare your results to Table 4 in the actual article. Ive highlighted the area of the table relating to our analyses (our five analyses are represented by the columns labelled N, E, O, A and C.

123

Oliver Twisted: Please Sir, Can I Have Some More Recode?

Our data set has missing values, worries Oliver. What do we do if we only want to recode cases for which we have data?. Well, we can set some other options at this point, thats what we can do. This is getting a little bit more involved so if you want to know more then the additional material for this chapter on the companion website will tell you. Stop worrying Oliver, everything will be OK.

One of the problems with the Glastonbury data is that we didnt have hygiene scores for all of the people at day 3. Therefore, when we calculated the change scores (day 3 minus day 1) we likewise only have data for a subset of our sample. When we come to recode the music variable, we should probably not recode the cases for which we dont have data for the change variable. This is fairly simple to do by setting an IF command. That is, we want to tell SPSS IF there is a value for the variable change then recode the variable music. To do this, click on to access the dialog box below:

124

By default SPSS will include all of the cases in the data, but we can use this dialog box to set conditions. So, we can tell SPSS recode these cases only if a certain condition is met. The condition that we want to set is that we want to recode only cases for which there is a value for the variable change (i.e. we want to exclude cases for which there are missing values in the variable change). To specify this, first click on to

activate the white box below. Rather like the compute command (see Chapter 5) we can type commands in this box, and select built-in commands from the boxes labelled Function group and Functions and Special Variables. You can see in the diagram that I have selected a command in the category Missing Values called Missing. To be specific, the condition that I have set is

(1 MISSING(change)). MISSING is a builtin command that returns true (i.e. the value 1) for a case that has a system-missing or user-defined missing value for the specified variable; it returns false (i.e. the value 0) if a case has a value. Hence, MISSING(change) returns a value of 1 for cases that have a missing value for the variable change and 0 for

125

cases that do have values. We want to recode the cases that do have a value for the variable change, therefore I have specified 1-MISSING(change). This command reverses MISSING(change) so that it returns 1 (true) for cases that have a value for the variable change and 0 (false) for system- or user-defined missing values. To sum up, the DO IF (1MISSING(change)) tells SPSS Do the following RECODE commands if the case has a value for the variable change.

Try creating the remaining two dummy variables (call them Metaller and Indie_Kid) using the same principles.

Select Variable

to access the recode dialog box. Select the variable Output Variable by clicking on . You then need to name the new variable. Go

you want to recode (in this case music) and transfer it to the box labelled Numeric to the part that says Output Variable and in the box below where it says Name write a name for your second dummy variable (call it Metaller). You can also give this variable a more descriptive name by typing something in the box labelled Label (for this first dummy variable Ive called it No Affiliation vs. Metaller). When youve done this click on transfer this new variable to the box labelled Numeric Variable should now say music Metaller). to Output Variable (this box

126

We need to tell SPSS how to recode the values of the variable music into the values that we want for the new variable, Metaller. To do this click on to access the dialog box below. This dialog box is used to change values of the original variable into different values for the new variable. For this dummy variable, we want anyone who was a metaller to get a code of 1 and everyone else to get a code of 0. Now, metaller was coded with the value 2 in the original variable, so you need to type the value 2 in the section labelled Old Value in the box labelled Value. The new value we want is 1, so we need to type the value 1 in the section labelled New Value in the box labelled Value. When youve done this, click on to add this change to the list of changes. The next thing we need to do is to and type the value 0 in the section labelled New Value in the box to add this change to the list of to return to the main dialog box, and then change the remaining groups to have a value of 0 for the first dummy variable. To do this just select labelled Value. When youve done this, click on changes. When youve done this click on click on

to create the dummy variable. This variable will appear as a new column in

the data editor, and you should notice that it will have a value of 1 for anyone originally classified as a metaller and a value of 0 for everyone else.

To create the final dummy variable select Variable (or click on

to access the Output

recode dialog box. Select music and drag it to the box labelled Numeric Variable

). Go to the part that says Output Variable and in the box below

where it says Name write a name for your final dummy variable (call it Indie_Kid). You can also give this variable a more descriptive name by typing something in the box labelled Label (for this first dummy variable Ive called it No Affiliation vs. Indie Kid). When youve

127

done this click on

to transfer this new variable to the box labelled Numeric Variable Indie_kid).

Output Variable (this box should now say music

We need to tell SPSS how to recode the values of the variable music into the values that we want for the new variable, Indie_Kid. To do this click on to access the dialog box below. For this dummy variable, we want anyone who was an indie kid to get a code of 1 and everyone else to get a code of 0. Now, indie kid was coded with the value 1 in the original variable, so you need to type the value 1 in the section labelled Old Value in the box labelled Value. The new value we want is 1, so we need to type the value 1 in the section labelled New Value in the box labelled Value. When youve done this, click on to add this change to the list of changes. The next thing we need to do is to change the remaining groups to have a value of 0 for the first dummy variable. To do this just select and type the value 0 in the section labelled New Value in the box labelled Value. When youve done this, click on When youve done this click on to add this change to the list of changes. to return to the main dialog box, and then click on

to create the dummy variable. This variable will appear as a new column in the data editor, and you should notice that it will have a value of 1 for anyone originally classified as an indie kid and a value of 0 for everyone else.

128

Use what youve learnt in this chapter to run a multiple regression using the

change scores as the outcome, and the three dummy variables (entered in the same block) as predictors.
Select to access the main dialog box for regression, which you should complete as below. Use the book chapter to determine what other options you want to select. The output and interpretation are in the book chapter.

129

Chapter 8

Self-Test Answers

Calculate the values of Cox and Snells and Nagelkerkes R2 reported by

SPSS. [Hint: These equations use the log-likelihood, whereas SPSS reports 2 log-likelihood. LL(New) is, therefore, 144.16/2 = 72.08, and LL(Baseline) = 154.08/2 = 77.04. The sample size, n, is 113.
Cox and Snells R2 is calculated from this equation:
2 ( LL ( New ) LL ( Baseline )) n

2 CS

= 1e

Remember that this equation uses the log-likelihood, whereas SPSS reports 2 loglikelihood. LL(New) is, therefore, 144.16/2 = 72.08, and LL(Baseline) = 154.08/2 = 77.04. The sample size, n, is 113:
2 113 ( 72.08 ( 77.04))

2 R CS = 1e

= 1 e 0.0878 = 1 0.916 = 0.084


Nagelkerkes adjustment is calculated from:

130

R =
2 N

2 R CS 2( LL ( Baseline ))

n 1e 0.084 = 1 e 1.3635 0.084 = 1 0.2558 = 0.113

Use the case summaries function in SPSS to create a table for the first 15 cases in the file Eel.sav showing the values of Cured, Intervention,
Duration, the predicted probability (PRE_1) and the predicted group

membership (PGR_1) for each case.


The completed dialog box should look like this:

Rerun this analysis using the forced entry method of analysis how do your conclusions differ?
Im not going to run through the whole analysis, but essentially the main bit of the output that Ill look at is the Variables in the Equation table:

131

Essentially, when all variables are entered none of them are significant. It looks like our intervention (which we concluded was successful) was not. Puzzling, eh? Well, actually not. The reason for this is because the Intervention and the Intervention Duration interation are very highly correlated. To prove this fact, I created a variable representing the interaction (this is easy to do: you use the compute command and multiple the two variables in the interaction together)! The table is below. Note the correlation between the intervention and the interaction. It is r = .98. Basically, its an almost perfect correlation. This means that these two variables are essentially the same, so when they are forced into the regression they are fighting over the same variance in the outcome variable. So, theyre both not significant. They are so highly correlated because there isnt a lot of variability in the variable Duration. Try rerunning the analysis now but without the interaction term.

132

If we rerun the analysis without the interaction, we get:

The intervention is no longer fighting over the same variance as the interaction term and so becomes significant again. We basically get the same results (in terms of significance) as we did from the stepwise method used in the chapter. [Incidentally, if you ran the analysis with the interaction term but not Intervention then youd find the interaction term is significant the reason why should be relatively obvious: its because the two variables share so much variance.]

133

We learnt how to do hierarchical regression in the previous chapter. Try to conduct a hierarchical logistic regression analysis on these data. Enter Previous and PSWQ in the first block and Anxious in the second. There is a full guide on how to do the analysis and its interpretation in the additional material on the website. Running the Analysis: Block Entry Regression
To run the analysis, we must first select the main Logistic Regression dialog box, by selecting . In this example, we know of two

previously established predictors and so it is a good idea to enter these predictors into the model in a single block. Then we can add the new predictor in a second block (by doing this we effectively examine an old model and then add a new variable to this model to see whether the model is improved). This method is known as block entry and the figure shows how it is specified. It is easy to do block entry regression. First you should use the mouse to select the variable scored from the variables list and then transfer it to the box labelled Dependent by clicking on . Second, you should select the two previously established predictors. So,

select pswq and previous from the variables list and transfer them to the box labelled Covariates by clicking on second block, click on . Our first block of variables is now specified. To specify the to clear the Covariates box, which should now be labelled

Block 2 of 2. Now select anxious from the variables list and transfer it to the box labelled Covariates by clicking on . We could at this stage select some interactions to be included

in the model, but unless there is a sound theoretical reason for believing that the predictors should interact there is no need. Make sure that Enter is selected as the method of regression (this method is the default and so should be selected already).

134

Once the variables have been specified, you should select the options described in the chapter but because none of the predictors are categorical there is no need to use the option. When you have selected the options and residuals that you want you can return to the main Logistic Regression dialog box and click on .

135

Interpreting Output
The output of the logistic regression will be arranged in terms of the blocks that were specified. In other words, SPSS will produce a regression model for the variables specified in block 1, and then produce a second model that contains the variables from both blocks 1 and 2. First, the output shows the results from block 0: the output tells us that 75 cases have been accepted and that the dependent variable has been coded 0 and 1 (because this variable was coded as 0 and 1 in the data editor, these codings correspond exactly to the data in SPSS). We are then told about the variables that are in and out of the equation. At this point only the constant is included in the model, and so to be perfectly honest none of this information is particularly interesting!

Dependent Variable Encoding Original Value Missed Penalty Scored Penalty Internal Value 0 1

Block 0: Beginning Block


a,b Classification Table

Predicted Result of Penalty Kick Missed Scored Penalty Penalty 0 35 0 40

Step 0

Observed Result of Penalty Kick Overall Percentage

Missed Penalty Scored Penalty

Percentage Correct .0 100.0 53.3

a. Constant is included in the model. b. The cut value is .500

Variables in the Equation

B Step 0 Constant .134

S.E. .231

Wald .333

df 1

Sig. .564

Exp(B) 1.143

Variables not in the Equation Step 0 Variables Overall Statistics PREVIOUS PSWQ Score 34.109 34.193 41.558 df 1 1 2 Sig. .000 .000 .000

136

The results from block 1 are shown next and in this analysis we forced SPSS to enter previous and pswq into the regression model. Therefore, this part of the output provides information about the model after the variables previous and pswq have been added. The first thing to note is that the 2LL is 48.66, which is a change of 54.98 (which is the value given by the model chi-square). This value tells us about the model as a whole whereas the block tells us how the model has improved since the last block. The change in the amount of information explained by the model is significant (p < .0001) and so using previous experience and worry as predictors significantly improves our ability to predict penalty success. A bit further down, the classification table shows us that 84% of cases can be correctly classified using pswq and previous. In the intervention example, Hosmer and Lemeshows goodness-of-fit test was 0. The reason is that this test cant be calculated when there is only one predictor and that predictor is a categorical dichotomy! However, for this example the test can be calculated. The important part of this test is the test statistic itself (7.93) and the significance value (.3388). This statistic tests the hypothesis that the observed data are significantly different from the predicted values from the model. So, in effect, we want a non-significant value for this test (because this would indicate that the model does not differ significantly from the observed data). We have a

non-significant value here, which is indicative of a model that is predicting the real-world data fairly well. The part of the output labelled Variables in the Equation then tells us the parameters of the model when previous and pswq are used as predictors. The significance values of the Wald statistics for each predictor indicate that both pswq and previous significantly predict penalty success (p < .01). The values of the odds ratio (Exp(B)) for previous indicates that if the percentage of previous penalties scored goes up by one, then the odds of scoring a penalty also increase (because the odds ratio is greater than 1). The

137

confidence interval for this value ranges from 1.02 to 1.11 so we can be very confident that the value of the odds ratio in the population lies somewhere between these two values. Whats more, because both values are greater than 1 we can also be confident that the relationship between previous and penalty success found in this sample is true of the whole population of footballers. The odds ratio for pswq indicates that if the level of worry increases by one point along the Penn State worry scale, then the odds of scoring a penalty decrease (because it is less than 1). The confidence interval for this value ranges from .68 to .93 so we can be very confident that the value of the odds ratio in the population lies somewhere between these two values. In addition, because both values are less than 1 we can be confident that the relationship between pswq and penalty success found in this sample is true of the whole population of footballers. If we had found that the confidence interval ranged from less than 1 to more than 1, then this would limit the generalizability of our findings because the odds ratio in the population could indicate either a positive (odds ratio > 1) or negative (odds ratio < 1) relationship. A glance at the classification plot also brings us good news because most cases are clustered at the ends of the plot and few cases lie in the middle of the plot. This reiterates what we know already: that the model is correctly classifying most cases. We can, at this point, also calculate R2 by dividing the model chi-square by the original value of 2LL. The result is:

R2 =

model chi-square original 2LL 54.977 = 103.6385 = 0.53

We can interpret the result as meaning that the model can account for 53% of the variance in penalty success (so, roughly half of what makes a penalty kick successful is still unknown).

138

Block 1: Method = Enter

Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 54.977 54.977 54.977 df 2 2 2 Sig. .000 .000 .000

Model Summary Step 1 -2 Log likelihood 48.662 Cox & Snell R Square .520 Nagelkerke R Square .694

Hosmer and Lemeshow Test Step 1 Chi-square 7.931 df 7 Sig. .339

Contingency Table for Hosmer and Lemeshow Test Result of Penalty Kick = Missed Penalty Observed Expected 8 7.904 8 7.779 8 6.705 4 5.438 2 3.945 2 1.820 2 1.004 1 .298 0 .108 Result of Penalty Kick = Scored Penalty Observed Expected 0 .096 0 .221 0 1.295 4 2.562 6 4.055 6 6.180 6 6.996 7 7.702 11 10.892

Total 8 8 8 8 8 8 8 8 11

Step 1

1 2 3 4 5 6 7 8 9

Classification Tablea Predicted Result of Penalty Kick Missed Scored Penalty Penalty 30 5 7 33

Step 1

Observed Result of Penalty Kick Overall Percentage

Missed Penalty Scored Penalty

Percentage Correct 85.7 82.5 84.0

a. The cut value is .500

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 1.022 1.114 .679 .929

Step a 1

PREVIOUS PSWQ Constant

B .065 -.230 1.280

S.E. .022 .080 1.670

Wald 8.609 8.309 .588

df 1 1 1

Sig. .003 .004 .443

Exp(B) 1.067 .794 3.598

a. Variable(s) entered on step 1: PREVIOUS, PSWQ.

139

The output for block 2 shows what happens to the model when our new predictor is added (anxious). So, we begin with the model that we had in block 1 and we then add anxious to it. The effect of adding anxious to the model is to reduce the 2LL to 47.416 (a reduction of 1.246 from the model in block 1 as shown in the model chi-square and block statistics). This improvement is non-significant, which tells us that including anxious in the model has not significantly improved our ability to predict whether a penalty will be scored or missed. The classification table tells us that the model is now correctly classifying 85.33% of cases. Remember that in block 1 there were 84% correctly classified and so an extra 1.33% of cases are now classified (not a great deal morein fact, examining the table shows us that only one extra case has now been correctly classified). The table labelled Variables in the Equation now contains all three predictors and something very interesting has happened: pswq is still a significant predictor of penalty success; however, previous experience no longer significantly predicts penalty success. In addition, state anxiety appears not to make a significant contribution to the prediction of penalty success. How can it be that previous experience no longer predicts penalty success,

140

and neither does anxiety, yet the ability of the model to predict penalty success has improved slightly? Block 2: Method = Enter

Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 1.246 1.246 56.223 df 1 1 3 Sig. .264 .264 .000

Model Summary Step 1 -2 Log likelihood 47.416 Cox & Snell R Square .527 Nagelkerke R Square .704

Hosmer and Lemeshow Test Step 1 Chi-square 9.937 df 7 Sig. .192

Contingency Table for Hosmer and Lemeshow Test Result of Penalty Kick = Missed Penalty Observed Expected 8 7.926 8 7.769 9 7.649 4 5.425 1 3.210 4 1.684 1 1.049 0 .222 0 .067 Result of Penalty Kick = Scored Penalty Observed Expected 0 .074 0 .231 0 1.351 4 2.575 7 4.790 4 6.316 7 6.951 8 7.778 10 9.933

Total 8 8 9 8 8 8 8 8 10

Step 1

1 2 3 4 5 6 7 8 9

Classification Tablea Predicted Result of Penalty Kick Missed Scored Penalty Penalty 30 5 6 34

Step 1

Observed Result of Penalty Kick Overall Percentage

Missed Penalty Scored Penalty

Percentage Correct 85.7 85.0 85.3

a. The cut value is .500

141

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper .950 1.578 .660 .917 .803 2.162

Step a 1

PREVIOUS PSWQ ANXIOUS Constant

B .203 -.251 .276 -11.493

S.E. .129 .084 .253 11.802

Wald 2.454 8.954 1.193 .948

df 1 1 1 1

Sig. .117 .003 .275 .330

Exp(B) 1.225 .778 1.318 .000

a. Variable(s) entered on step 1: ANXIOUS.

The classification plot is similar to before and the contribution of pswq to predicting penalty success is relatively unchanged. What has changed is the contribution of previous experience. If we examine the values of the odds ratio for both previous and anxious it is clear that they both potentially have a positive relationship to penalty success (i.e. as they increase by a unit, the odds of scoring improve). However, the confidence intervals for these values cross 1, which indicates that the direction of this relationship may be unstable in the population as a whole (i.e. the value of the odds ratio in our sample may be quite different to the value if we had data from the entire population).

You may be tempted to use this final model to say that, although worry is a significant predictor of penalty success, the previous finding that experience plays a role is incorrect.

142

This would be a dangerous conclusion to make and if you read the section on multicollinearity in the book youll see why!

Try creating two new variables that are the natural log of Anxious and
Previous.
First of all, the completed dialog box for PSWQ is given below to give you some idea of how this variable is created (following the instructions in the chapter):

For Anxious, create a new variable called LnAnxious by entering this name into the box labelled Target Variable and then click on and give the variable a more descriptive

name such as Ln(anxiety). In the list box labelled Function group, click on Arithmetic and then in the box labelled Functions and Special Variables click on Ln and transfer it to the command area by clicking on . Replace the question mark with the variable Anxious by

143

either selecting the variable in the list and clicking on question mark is. Click on to create the variable.

or just typing Anxious where the

For Previous, create a new variable called Ln Previous by entering this name into the box labelled Target Variable and then click on and give the variable a more

descriptive name such as Ln(previous performance). In the list box labelled Function group, click on Arithmetic and then in the box labelled Functions and Special Variables click on Ln (this is the natural log transformation) and transfer it to the command area by clicking on . Replace the question mark with the variable Previous by either selecting or just typing Previous where the question mark

the variable in the list and clicking on is. Click on to create the variable.

Alternatively, you can create all three variables in one go using the following syntax:

Using what you learned in Chapter 6, carry out a Pearson correlation between all of the variables in this analysis. Can you work out why we have a problem with collinearity?
The results of your analysis should look like this:

144

Correlations Percentage of previous penalties scored .674** .000 75 -.993** .000 75 1.000 . 75 -.644** .000 75

Result of Penalty Kick Result of Penalty Kick Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 1.000 . 75 -.668** .000 75 .674** .000 75 -.675** .000 75

State Anxiety -.668** .000 75 1.000 . 75 -.993** .000 75 .652** .000 75

Penn State Worry Questionnaire -.675** .000 75 .652** .000 75 -.644** .000 75 1.000 . 75

State Anxiety

Percentage of previous penalties scored

Penn State Worry Questionnaire

**. Correlation is significant at the 0.01 level (2-tailed).

From this output we can see that Anxious and Previous are highly negatively correlated (r = 0.99); in fact they are nearly perfectly correlated. Both Previous and Anxious correlate with penalty success1 but because they are correlated so highly with each other, it is unclear which of the two variables predicts penalty success in the regression. As such our multicollinearity stems from the near perfect correlation between Anxious and Previous.

What does the log-likelihood measure?

The log-likelihood statistic is analogous to the residual sum of squares in multiple regression in the sense that it is an indicator of how much unexplained information there is after the model has been fitted. It, therefore, follows that large values of the log-likelihood statistic indicate poorly fitting statistical models, because the larger the value of the loglikelihood, the more unexplained observations there are.

If you think back to Chapter 6, these correlations with penalty success (a dichotomous variable) are pointbiserial correlations.

145

Use what you learnt earlier in this chapter to check the assumptions of multicollinearity and linearity of the logit.

Testing for Linearity of the Logit


In this example we have three continuous variables (Funny, Sex, Good_Mate), therefore we have to check that each one is linearly related to the log of the outcome variable (Success). To test this assumption we need to run the logistic regression but include predictors that are the interaction between each predictor on the log of itself. For each variable create a new variable that is the log of the original variable. For example, for Funny, create a new variable called LnFunny by entering this name into the box labelled Target Variable and then click on and give the variable name such as Ln(Funny).

In the list box labelled Function group, click on Arithmetic and then in the box labelled Functions and Special Variables click on Ln (this is the natural log transformation) and transfer it to the command area by clicking on . When the command is transferred, it

appears in the command area as LN(?) and the question mark should be replaced with a variable name (which can be typed manually or transferred from the variables list). So replace the question mark with the variable Funny by either selecting the variable in the list and clicking on create the variable. , or just typing PSWQ where the question mark is. Click on to

146

Repeat this process for Sex and Good_Mate. Alternatively do all three at once using this syntax: COMPUTE LnFunny=LN(Funny). COMPUTE LnSex=LN(Sex). COMPUTE LnGood_Mate=LN(Good_Mate). EXECUTE.

To test the assumption we need to redo the analysis but putting in our three covariates, and also the interactions of these covariates with their natural logs. So, as with the main example in the chapter we need to specify a custom model:

147

Note that (1) we need to enter the log variables in the first screen so that they are listed in the second dialog box, and (2) in the second dialog box we have only included the main effects of Sex, Funny and Good_Mate and their interactions with their log values.

148

The output above is all that we need to look at because it tells us about whether any of our predictors significantly predict the outcome categories (generally). The assumption of linearity of the logit is tested by the three interaction terms, all of which are significant (p < .05). This means that all three predictors have violated the assumption.

Testing for Multicollinearity


You can obtain statistics such as the tolerance and VIF by simply running a linear regression analysis using the same outcome and predictors as the logistic regression. It is essential that you click on Once you have selected and then select Collinearity diagnostics in the dialog box. , switch off all of the default options, click on to

return you to the Linear Regression dialog box, and then click on

to run the analysis.

149

Menard (1995; see book references) suggests that a tolerance value less than 0.1 almost certainly indicates a serious collinearity problem. Myers (1990; see book references) also suggests that a VIF value greater than 10 is cause for concern and in these data all of the VIFs are well below 10 (and tolerances above 0.1). It seems from these values that there is not an issue of collinearity between the predictor variables. We can investigate this issue further by examining the collinearity diagnostics.

150

The table labelled Collinearity Diagnostics gives the eigenvalues of the scaled, uncentred cross-products matrix, the condition index and the variance proportions for each predictor. If the eigenvalues are fairly similar then the derived model is likely to be unchanged by small changes in the measured variables. The condition indexes are another way of expressing these eigenvalues and represent the square root of the ratio of the largest eigenvalue to the eigenvalue of interest (so, for the dimension with the largest eigenvalue, the condition index will always be 1). For these data the final dimension has a condition index of 15.03, which is nearly twice as large as the previous one. Although there are no hard and fast rules about how much larger a condition index needs to be to indicate collinearity problems, this could indicate a problem. For the variance proportions we are looking for predictors that have high proportions on the same small eigenvalue, because this would indicate that the variances of their regression coefficients are dependent. So we are interested mainly in the bottom few rows of the table (which represent small eigenvalues). In this example, 4057% of the variance in the regression coefficients of both Sex and Good_Mate is associated with eigenvalue number 4 and 3439% with eigenvalue number 5 (the smallest eigenvalue), which indicates some dependency between these variables. So, there is some dependency between Sex and Good_Mate but given the VIF we can probably assume that this dependency is not problematic.

151

Additional Material Diagnostics for the Eel.sav analysis

152

153

Labcoat Lenis Real Research: Mandatory suicide?

Lacourse, E. et al. (2001). Journal of Youth and Adolescence, 30, 321332.

As you might have noticed by now, although I have fairly ecclectic tastes in music, my favourite kind of music is heavy metal. One thing that is mildly irritating about liking heavy music is that everyone assumes that youre a miserable or aggressive bastard. When not listening (and often while listening to) heavy metal, I spend most of my time researching clinical psychology: I research how anxiety develops in children. Therefore, I was literally beside myself with excitement when a few years back I stumbled on a paper that combined these two interests. Lacourse, Claes, and Villeneuve (2001) carried out a study to see whether a love of heavy metal could predict suicide risk. Fabulous stuff! Eric Lacourse and his colleagues used questionnaires to measure several background variables: suicide risk (yes or no), marital status of parents (together or divorced/separated), the extent to which the persons mother and father were neglectful, self-

estrangement/powerlessness (adolescents who have negative self-perceptions, are bored with life, etc.), social isolation (feelings of a lack of support), normlessness (beliefs that socially disapproved behaviours can be used to achieve certain goals), meaninglessness (doubting that school is relevant to gain employment), and drug use. In addition, they measured liking of different categories of music. For heavy metal they included classic bands (Black Sabbath, Iron Maiden), thrash metal bands (Slayer, Metallica), death/black metal bands (Obituary, Burzum)

154

and gothic bands (Marilyn Manson, Sisters of Mercy). As well as liking they measured behavioural manifestations of worshipping these bands (hanging posters, hanging out with other metal fans), and vicarious music listening (whether music was used when angry or to bring out aggressive moods). They carried out a logistic regression predicting suicide risk from all of these predictors for males and females separately. The data for the female sample are in the file Lacourse et al. (2001) Females.sav. Labcoat Leni wants you to carry out a logistic regression predicting Suicide_Risk from all of the other predictors (forced entry). (To make it easier to compare to the published results I suggest you enter the predictors in the same order as Table 3 in the paper: Age, Marital_Status,
Mother_Negligence, Father_Negligence, Self_Estrangement, Isolation, Normlessness, Meaninglessness, Drug_Use, Metal, Worshipping, Vicarious.) Create a table of the results;

does listening to heavy metal make girls suicidal? If not, what does? Answers are in the additional material for this website (or look at Table 3 in the original article).

The main analysis is fairly simple to specify because were just forcing all predictors in at the same time. Therefore, the completed main dialog box should look like this (note that I have ordered the predictors as suggested by Labcoat Leni, and that you wont see all of them in the dialog box because the list is too long!):

155

We also need to specify our categorical variables. We have only 1 Marital_status:

I have chosen an indicator contrast with the first category (Together) as the reference category. It actually doesnt matter whether you select first or last because there are only two categories. However, it will affect the sign of the beta coefficient. I have chosen the first category as the reference category purely because it gives us a positive beta as in Lacourse et al.s table. If you chose last (the default) the resulting coefficient will be the same magnitude but a negative value instead. You can select whatever other options you see fit based on the chapter (the CI for Exp(B) will need to be selected to get the same output as below). The main output is as follows:

156

We can present these results in the following table: 95% CI for Odds Ratio

Odds B SE Lower Ratio Constant Age Marital status Mother negligence Father negligence Self-estrangement/
6.21 0.69* 6.21 0.32 1.06 2.00 3.77

Upper

0.18

0.68

0.32

1.20

4.53

0.02 0.09* 0.15*

0.05 0.05 0.06

0.88 0.99 1.03

0.98 1.09 1.17

1.09 1.20 1.33

157

powerlessness Social isolation Normlessness Meaninglessness Drug use Metal Worshipping Vicarious listening
0.01 0.19* 0.07 0.32** 0.14 0.16* 0.34 0.08 0.11 0.06 0.10 0.09 0.13 0.20 0.86 0.98 0.83 1.12 0.96 0.91 0.48 0.99 1.21 0.94 1.37 1.15 1.17 0.71 1.15 1.50 1.05 1.68 1.37 1.51 1.04

*p < .05, ** p < .01; one-tailed

Ive reported one-tailed significances (because Lacourse et al. do and it makes it easier to compare our results to Table 3 in their paper). We can conclude that listening to heavy metal did not significantly predict suicide risk in women (of course not; anyone Ive ever met who likes metal does not conform to the stereotype). However, in case youre interested, listening to country music apparently does (Stack & Gundlach, 1992). The factors that did predict suicide risk were age (risk increased with age), father negligence (although this was significant only one-tailed, it showed that as negligence increased so did suicide risk), self-estrangement (basically low self-esteem predicted suicide risk, as you might expect), normlessness (again, only 1-tailed), drug use (the more drugs used, the more likely a person was to be in the atrisk category), and worshipping (the more the person showed signs of worshipping bands, the more likely they were to be in the atrisk group). The most significant predictor was drug use.

158

So, this shows you that for girls, listening to metal was not a risk factor for suicide, but drug use was. To find out what happens for boys, youll just have to read the article! This is Scientific proof that metal isnt bad for your health, so download some Deathspell Omega and enjoy!

159

160

Chapter 9

Self-Test Answers

Enter these data into SPSS. Plot an error bar graph of the spider data.

You can check your data entry against the file spiderBG.sav. The completed graph editor window should look like this:

161

Enter these data into SPSS. Plot an error bar graph of the spider data.
You can check your data entry against the file spiderRM.sav. The completed graph editor window should look like this:

Create an error bar chart of the mean of the adjusted values that you have just made (Real_Adjusted and Picture_Adjusted).
The completed graph editor window should look like this:

162

Using the spiderRM.sav data, compute the differences between the picture and real condition and check the assumption of normality for these differences.
First compute the differences using the compute function:

163

Next, use

to get some plots and the KS test:

The output shows that the distribution of differences is not significantly different from normal, D(12) = 0.13, p > .05. The QQ plot also shows that the quantiles fall pretty much on the diagonal line (indicating normality). As such, it looks as though we can assume that our differences are normal and that, therefore, the sampling distribution of these differences is normal too. Happy days!

164

Additional Material

Labcoat Lenis Real Research: You dont have to be mad here, but it helps

Board, B. J., & Fritzon, K. (2005). Psychology, Crime & Law, 11, 1732.

165

In the UK you often see the humorous slogan You dont have to be mad to work here, but it helps stuck up in workplaces. Well, Board and Fritzon (2005) took this a step further by measuring whether 39 senior business managers and chief executives from leading UK companies had personality disorders (PDs). They gave them The Minnesota Multiphasic Personality Inventory Scales for DSM III Personality Disorders (MMPI-PD), which is a well validated measure of 11 personality disorders: Histrionic, Narcissistic, Antisocial,

Borderline, Dependent, Compulsive, Passiveaggressive, Paranoid, Schizotypal, Schizoid and Avoidant. They needed a comparison group, and what better one to choose than 317 legally classified psychopaths at Broadmoor Hospital (a famous high-security psychiatric hospital in the UK). The authors report the means and SDs for these two groups in Table 2 of their paper. Using these values and the syntax file Independent t from means.sps we can run ttests on these means. The data from Board and Fritzons (2005) Table 2 are in the file Board and Fritzon 2005.sav. Use this file and the syntax file to run t-tests to see whether managers score higher on personality disorder questionnaires than legally classified psychopaths. Report these results. What do you conclude?

The data looks like this:

166

The columns represent the following: Outcome: a string variable that tells us which personality disorder the numbers in each row relate to. X1: Mean of the managers group. X2: Mean of the psychopaths group. sd1: Standard deviation of the managers group. sd2: Standard deviation of the psychopaths group. n1: The number of managers tested. n2: The number of psychopaths tested. The syntax file looks like this:

167

We can run the syntax by selecting The output looks like this:

We can report that managers scored significantly higher than psychopaths on histrionic personality disorder, t(354) = 7.18, p < .001, d = 1.22. There were no significant differences between groups on Narscissistic personality disorder, t(354) = 1.41, p > .05, d = 0.24 , or Compulsive personality disorder, t(354) = 0.77, p > .05, d = 0.13. On all

168

other measures, psychopaths scored significantly higher than managers: Antisocial personality disorder, t(354) = 5.23, p < .001, d = 0.89; Borderline personality disorder, t(354) = 10.01, p < .001, d = 1.70; Dependent personality disorder, t(354) = 1.67; Passive-aggressive personality disorder, t(354) = 9.80,

p < .001, d = .001, d =

3.83, p < 1.48;

0.65; Paranoid personality disorder, t(354) = 8.73, p < .001, d =

Schizotypal personality disorder, t(354) = personality disorder, t(354) =

10.76, p < .001, d = 1.83; Schizoid 1.39; Avoidant personality

8.18, p < .001, d = 1.07.

disorder, t(354) = 6.31, p < .001, d =

The results show the presence of elements of PD in the senior business manager sample, especially those most associated with psychopathic PD. The senior business manager group showed significantly higher levels of traits associated with histrionic PD than psychopaths. They also did not significantly differ from psychopaths in narcissistic and compulsive PD traits. These findings could be an issue of power (effects were not detected but are present). The effect sizes d can help us out here and these are quite small (0.24 and 0.13), which can give us confidence that there really isnt a difference between psychopaths and managers on these traits. Broad and Fritzon (2005) conclude that: At a descriptive level this translates to: superficial charm, insincerity, egocentricity, manipulativeness

(histrionic), grandiosity, lack of empathy, exploitativeness, independence (narcissistic), perfectionism, excessive devotion to work, rigidity, stubbornness, and dictatorial

tendencies (compulsive). Conversely, the senior business manager group is less likely to demonstrate physical aggression, consistent irresponsibility with work and finances, lack of remorse (antisocial), impulsivity, suicidal gestures, affective instability (borderline), mistrust (paranoid), and hostile defiance alternated with contrition (passive/aggressive). And these people are in charge of large companies like Sage Publications Ltd. Hmm, suddenly a lot things make sense.

169

Chapter 10

Self-Test

To illustrate exactly what is going on I have created a file called


dummy.sav. This file contains the Viagra data but with two additional

variables (dummy1 and dummy2) that specify to which group a data point belongs (as in Table 10.2). Access this file and run multiple regression analysis using libido as the outcome and dummy1 and
dummy2 as the predictors. If youre stuck on how to run the regression

then read Chapter 7 again (see, these chapters are ordered for a reason)!
The dialog box for the regression should look like this:

To illustrate these principles, I have created a file called


Contrast.sav in which the Viagra data are coded using the

contrast coding scheme used in this section. Run multiple

170

regression analyses on these data using libido as the outcome and using dummy1 and dummy2 as the predictor variables (leave all default options).
Your completed regression dialog box should look like this:

Produce a line chart with error bars for the Viagra data.

Your complete Chart Builder should look like this:

171

Additional Material
Oliver Twisted: Please Sir, Can I Have Some More Levenes Test?

Liar! Liar! Pants on fire, screams Oliver his cheeks red and eyes about to explode. You promised to explain Levenes test properly and you havent, you spatula head. True enough, Oliver, I do have a spatula for a head. I also have a very nifty little demonstration of Levenes test in the additional material for this chapter on the companion website. It will tell you more than you could possibly want to know. Lets go fry an egg

172

Levenes test is basically an ANOVA conducted on the absolute differences between the observed data and the mean from which the data came. To see what I mean lets do a sort of manual Levenes test on the Viagra data. First we need to create a new variable called difference (short for Difference from group mean), which is each score subtracted from the mean of the group to which that score belongs. Remember that means for the placebo, lowdose and highdose groups were 2.2, 3.2 and 5 respectively, and the groups were coded 1, 2 and 3. We can compute this new variable using syntax:

IF (dose = 1) Difference=libido - 2.2. IF (dose = 2) Difference=libido - 3.2. IF (dose = 3) Difference=libido - 5. VARIABLE LABELS Difference 'Difference from Group Mean'. EXECUTE.

The first line just says that if dose = 1 (i.e. placebo) then the difference is the value of libido minus 2.2 (the mean of the placebo group). The next two lines do the same thing for the low- and highdose groups. The resulting data look like this:

173

Note that for person 1, the difference score is 3

2.2 = 0.8, for person 2 it is 2

2.2 =

0.20. As we move into the low-dose group we subtract the mean of that group, so person 6s difference is 5 3.2 = 1.8, person 7 is 2 3.2 = 1.20. In the high dosegroup, the 5 = 2 and so on. Think about

group mean is 5, so for person 11 we get a difference of 7

what these differences are; they are deviations from the mean, the same deviations that we calculate when we compute the sums of squares and variance and standard deviation. They represent variation from the mean. When we compute the variance we square the values to get rid of the plus and minus signs (otherwise the positive and negative deviations will cancel out). Levenes test doesnt do this (because we dont want to change the units of measurement by squaring the values), but instead simply takes the absolute values; that is, it pretends that all of the deviations are positive. To get the absolute values of these differences (i.e. we need to make them all positive values), again we can do this with syntax: Compute Difference = abs(Difference).

174

VARIABLE LABELS Difference 'Absolute Difference from Group Mean'. EXECUTE.

The first line just changes the variable difference to be the absolute value of itself. The second line renames the variable to reflect the fact that it now contains absolute values. The data now look like this:

Note that the difference scores are the same magnitude, its just that the minus signs have gone. These values still represent deviations from the mean, or variance, we just now dont have the problem of positive and negative deviations cancelling each other out. Now, using what you learnt in the book conduct a one-way ANOVA on these difference scores: dose is the independent variable and diff is the dependent variable (dont select any special options, just run a basic analysis). The main dialog box should look like this:

175

Youll find that the F-ratio for this analysis is 0.092, which is significant at p = 0.913; that is, the same values as Levenes test in the book!

Levenes test is, therefore, testing whether the average absolute deviation from the mean is the same in the three groups. Clever, eh?

Labcoat Lenis Real Research 10 .1: Scraping the barrel?


Gallup, G.G.J. et al. (2003). Evolution and Human Behavior, 24, 277289.

Evolution has endowed us with many beautiful things (cats, dolphins, the Great Barrier

176

Reef, etc.) all selected to fit their ecological niche. Given evolutions seemingly limitless capacity to produce beauty, its something of a wonder how it managed to produce such a monstrostity as the human penis. One theory is that the penis evolved into the shape that it is because of sperm competition. Specifically, the human penis has an unusually large glans (the bell-end as its affectionately known) compared to other primates, and this may have evolved so that the penis can displace seminal fluid from other males by scooping it out during intercourse. To put this idea to the test Gordon Gallup and his colleagues came up with an ingenious study (Gallup et al., 2003). Armed with various female masturbatory devices from Hollywood Exotic Novelties, an artificial vagina from California Exotic Novelties, and some water and cornstarch to make fake sperm, they loaded the artificial vagina with 2.6 ml of fake sperm and inserted one of three female sex toys into it before withdrawing it. Over several trials, three different female sex toys were used: a control phallus that had no coronal ridge (i.e. no bell-end), a phallus with a minimal coronal ridge (small bell-end) and a phallus with a coronal ridge. They measured sperm displacement as a percentage using the following equation (included here because it is more interesting than all of the other equations in this book):

weight of vagina with semen weight of vagina following intertion and removal of phallus weight of vagina with semen weight of empty vagina
As such, 100% means that all of the sperm was displaced by the phallus, and 0% means that none of the sperm was displaced. If the human penis evolved as a sperm displacement device then we predict: (1) that having a bell-end will displace more sperm than not; and (2) the phallus with the larger coronal ridge will displace more sperm than the phallus with the minimal coronal ridge. The conditions are ordered (no ridge, minimal ridge, normal ridge) so we might also predict a linear trend. The data can be found in the file Gallup et al.sav. Conduct a one-way ANOVA with planned comparisons to test the two hypotheses

177

above.

OK, lets do the graph first. There are two variables in the data editor: Phallus (the independent variable that has three levels: no ridge, minimal ridge and normal ridge) and Displacement (the dependent variable, the percentage of sperm displaced). The graph should therefore plot Phallus on the x-axis and Displacement on the Y-axis. The completed dialog box should look like this:

178

The final graph looks like this (I have edited mine, you can edit yours too to get some practice):

This graph shows that having a coronal ridge results in more sperm displacement than not having one. The size of ridge made very little difference. For the ANOVA the dialog box should look like this:

To test our hypotheses we need to enter the following codes:

Group

179

No Ridge (Control) Contrast 1 Contrast 2 0 2

Minimal Ridge 1 1

Coronal Ridge 1 1

Contrast 1 tests hypothesis 1: (1) that having a bell-end will displace more sperm than not. To test this we compare the two conditions with a ridge against the control condition (no ridge). So we compare chunk 1 (no ridge) to chunk 2 (minimal ridge, coronal ridge). The numbers assigned to the groups are the number of groups in the opposite chunk, and then we randomly assigned one chunk to be a negative value (the codes 2 1 1 would work fine as well).

Contrast 2 tests hypothesis 2: (2) the phallus with the larger coronal ridge will displace more sperm than the phallus with the minimal coronal ridge. First we get rid of the control phallus by assigning a code of 0; next we compare chunk 1 (minimal ridge) to chunk 2 (coronal ridge). The numbers assigned to the groups are the number of groups in the opposite chunk, and then we randomly assigned one chunk to be a negative value (the codes 0 1 1 would work fine as well). We enter these codes into SPSS as below:

180

We should also ask for homogeneity tests and corrections:

This tells us that Levenes test is not significant, F(2, 12) = 1.12, p > .05, so we can assume that variances are equal.

181

The main ANOVA tells us that there was a significant effect of the type of phallus, F(2, 12) = 41.56, p <.001. (This is exactly the same result as reported in the paper on page 280.) There is also a significant linear trend, F(1, 12) = 62.47, p > .001, indicating that more sperm was displaced as the ridge increased (however, note from the graph that this effect reflects the increase in displacement as we go from no ridge to having a ridge; there is no extra increase from minimal ridge to coronal ridge).

This table tells us that we entered our weights correctly;

Contrast 1 tells us that hypothesis 1 is supported: having some kind of ridge led to greater sperm displacement than not having a ridge, t(12) = 9.12, p < .001. Contrast 2 shows that hypothesis 2 is not supported: the amount of sperm displaced by the normal coronal ridge

182

was not significantly different from the amount displaced by a minimal coronal ridge, t(12) = 0.02, p = .99.

Chapter 11

Self-Test Answers

Use SPSS to find out the mean and standard deviation of both the participants libido and that of their partner in the three groups.

The easiest way to get these values is to select because this allows us to split the analysis by group; however, this is not the only way (we could, for example, split the file and then run the descriptive command, we can also select ; although we dont use this command in the book it is fairly self-evident how to use it!). Complete the dialog box as follows and youll get a beautiful (?) table of descriptive statistics for both variables, split by each group:

183

Conduct an ANOVA to test whether partners libido (our covariate) is independent of the dose of Viagra (our independent variable).

We can do this analysis by selecting also ( do the analysis using the same dialog box that we use

, but we can for ANCOVA

). If we do the latter then we can follow the example

in the chapter but simply exclude the covariate. Therefore, the completed dialog box would look like this:

184

Run a one-way ANOVA to see whether the three groups differ in their levels of libido.

We can do this analysis by selecting also ( do the analysis using the same dialog box that we use

, but we can for ANCOVA

). If we do the latter then we can follow the example

in the chapter but simply exclude the covariate. Therefore, the completed dialog box would look like this:

Why do you think that the results of the post hoc test differ to the contrasts for the comparison of the low-dose and placebo group?

This contradiction might result from a loss of power in the post hoc tests (remember that planned comparisons have greater power to detect effects than post hoc procedures).

185

However, there could be other reasons why these comparisons are non-significant and we should be very cautious in our interpretation of the significant ANCOVA and subsequent comparisons.

Add two dummy variables to the file ViagraCovariate.sav that compare the low dose to the placebo (Low_Placebo) and the high dose to the placebo (High_Placebo). If you get stuck then download
ViagraCovariateDummy.sav.

Run a hierarchical regression analysis with Libido as the outcome. In the first block enter partners libido (Partner_Libido) as a predictor, and then in the second block enter both dummy variables (forced entry).

To get to the main regression dialog box select

. Select

the outcome variable (Libido) and then drag it to the box labelled Dependent or click on . To specify the predictor variable for the first block we select Partner_Libido and drag it to the box labelled Independent(s) or click on . Underneath the Independent(s) box,

there is a drop-down menu for specifying the Method of regression. The default option is forced entry, and this is the option we want. Having specified the first block in the hierarchy, we need to move on to to the second. To tell the computer that you want to specify a new block of predictors you must click on . This process clears the Independent(s) box so that you can enter the new predictors (you should also note that above this box it now reads Block 2 of 2 indicating that you are in the second block of the two that you have so far specified). The second block must contain both of the dummy variables so you should click on Low_Placebo and High_Placebo in the variables list and drag them to the Independent(s) box by clicking on

186

. We also want to leave the method of regression set to Enter. The dialog boxes for the two stages in the hierarchy are shown below:

We just want to run a basic analysis, so we can leave all of the default options as they are and click on .

Rerun the ANCOVA but select

. Do the values of partial eta

squared match the ones we have just calculated?

You should get the following output:

187

This table is the same as the main ANCOVA that we did in the chapter except that there is an extra column at the end with the values of partial eta squared. For Dose, partial eta square is .24, and for Partner_Libido it is .16, both of which are the same as we calculated by hand in the chapter.

Additional Material

Labcoat Lenis Real Research: Space Invaders


Muris, P. et al. (2008). Child Psychiatry and Human Development, XX, XXXXXX.

Anxious people tend to interpret ambiguous information in a negative way. For example, being highly anxious myself, if I overheard a student saying Andy Fields lectures are really different I would assume that different meant rubbish, but it could also mean refreshing or innovative. One current mystery is how these interpretational biases develop in children. Peter Muris and his colleagues addressed this issue in an ingenious study. Children did a computerized task in which they imagined that they were astronauts who had discovered a new planet. Although the planet was similar to Earth, some things were different. They were given some scenarios about their time on the planet (e.g. On the street, you encounter a spaceman. He has a sort of toy handgun and he fires at you ) and the child had to decide which of two outcomes occurred. One outcome was positive (You are laughing: it is a water pistol and the weather is fine anyway) and the other negative (Oops, this hurts! The pistol produces a red beam which burns your skin!). After each response the child was told whether their choice was correct. Half of the children were

188

always told that the negative interpretation was correct, and the reminder were always told that the positive interpretation was correct. As such, over 30 scenarios children were trained to interpret their experiences on the planet as negative or positive. Muris et al. then gave children a standard measure of interpretational biases in everyday life to see whether the training had created a bias to interpret things negatively. In doing so, they could ascertain whether children learn interpretational biases through feedback (e.g. from parents) about how to disambiguate ambiguous situations. The data from this study are in the file Muris et al (2008).sav. The main independent variable is Training (positive or negative) and the outcome variable was the childs interpretational bias score (Interpretational_Bias)a high score reflects a tendency to interpret situations negatively. In a study such as this, it is important to factor in the Age and Gender of the child and also their natural anxiety level (which the researches measured with a standard questionnaire of child anxiety called the SCARED). Labcoat Leni wants you to carry out a one-way ANCOVA on these data to see whether Training significantly affected childrens Interpretational_Bias using Age Gender, and SCARED as covariates. What can you conclude? Answers are in the additional material for this website (or look at pp.475476 in the original article).

To

run

this

analysis

we

need

to

access

the

main

dialog

box

by

selecting

. Select Interpretational_Bias and drag this variable to the box labelled Dependent Variable or click on . Select Training (i.e. the type of

training that the child had) and drag it to the box labelled Fixed Factor(s) and then select Gender, Age and SCARED (by holding down Ctrl while you click on these variables) and

189

drag these variables to the box labelled Covariate(s). The finished dialog box should look like this:

In the chapter we looked at how to select contrasts but because our main predictor variable (the type of training) has only two levels (positive or negative) we dont need contrasts: the main effect of this variable can only reflect differences between the two types of training. The main output is:

190

First, notice that Levenes test is non-significant, F(1, 68) = 1.09, p > .05, which tells us that the variance in bias scores was fairly similar in the two training groups. In other words, the assumption of homogeneity of variance has been met. In the main table, we can see that even after partialling out the effects of age, gender and natural anxiety, the training had a significant effect on the subsequent bias score, F(1, 65) = 13.43. The means in the table tell us that interpretational biases were stronger (higher) after negative training. This result is as expected. It seems then that giving children feedback that tells them to interpret ambiguous situations negatively does induce an interpretational bias that persists into everyday situations, which is an important step towards understanding how these biases develop. In terms of the covariates, age did not influence the acquisition of interpretational biases. However, anxiety and gender did. If we look at the parameter estimates table, we can use the beta values to interpret these effects. For anxiety (SCARED), b = 2.01, which reflects a positive relationship. Therefore, as anxiety increases, the interpretational bias increases also (this is what you would expect because anxious children would be more likely to naturally interpret ambiguous situations in a negative way). If you draw a scatterplot of the relationship between SCARED and Interpretational_Bias youll see a very nice positive relationship. For Gender, b = 26.12, which again is positive but to interpret this we need to know how the children were coded in the data editor. Boys were coded as 1 and girls as 2. Therefore, as a child changes (not literally) from a boy to a girl, their interpretational biases increase. In other words, girls show a stronger natural tendency to interpret

191

ambiguous situations negatively. This is consistent with the anxiety literature, which shows that females are more likely to have anxiety disorders. One important thing to remember is that although anxiety and gender naturally affected whether children interpreted ambiguous situations negatively, the training (the experiences on the alien planet) had an effect above and beyond these natural tendencies (in other words, the effects of training cannot be explained by gender of natural anxiety levels in the sample). Have a look at the original article to see how Muris et al. reported the results of this analysis this can help you to see how you can report your own data from an ANCOVA. (One bit of good practice that you should note is that they report effect sizes from their analysis as you will see from the book chapter this is an excellent thing to do.)

Chapter 12

Self-Test Answers

192

Use the Chart Builder to plot a line graph (with error bars) of the attractiveness of the date with alcohol consumption on the x-axis and differentcoloured lines to represent males and females.

To do a multiple line chart for means that are independent (i.e. have come from different groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Attractiveness from the variable list and drag it into from the variable list and drag it into drag it into ; select Alcohol

; finally select the gender variable and

. This will mean that lines representing males and females will be

displayed in different colours. Select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph.

193

The resulting graph can be found in the book chapter.

Plot error bar graphs of the main effects of alcohol and gender.

To do an error bar chart click on the bar chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Attractiveness from the variable list and drag it into the variable list and drag it into and click on , then select Alcohol from

. Select error bars in the properties dialog box to produce the graph.

to apply them to the Chart Builder. Click on

194

To do the graph of the gender main effect just drag gender into the

drop

zone to replace alcohol. The respective dialog boxes are shown below. The completed (edited) graphs are in the book.

195

The file GogglesRegression.sav contains the dummy variables used in this example, and just to prove that all of this works, use this file and run a multiple regression on the data.

To get to the main regression dialog box select

. Select

the outcome variable (Attractiveness) and then drag it to the box labelled Dependent or click on . We want to specify both predictors and their interaction in the same block. To

specify the predictor variables, select Gender, Alcohol and the Interaction and drag them to the box labelled Independent(s) or click on . Underneath the Independent(s)

box, there is a drop-down menu for specifying the Method of regression. The default option

196

is forced entry, and this is the option we want. We just want to run a basic analysis, so we can leave all of the default options as they are and click on .

Additional Material Oliver Twisted: Please Sir, Can I Customize My Model?

My friend told me that there are different types of sums of squares complains Oliver with an air of impressive authority, why havent you told us about them? Is it because you have a microbe for a brain? No, its not Oliver, its because everyone but you will find this very tedious. If you want to find out more about what the button does, and the different types of sums of squares that

can be used in ANOVA, then the additional material will tell you.

197

By default SPSS conducts a full factorial analysis (i.e. it includes all of the main effects and interactions of all independent variables specified in the main dialog box). However, there may be times when you want to customize the model that you use to test for certain things. To access the model dialog box, click on in

the main dialog box. You will notice that, by default, the full factorial model is selected. Even with this selected, there is an option at the bottom to change the types of sums of squares that are used in the analysis. Although we have learnt about sums of squares and what they represent, I havent talked about different ways of calculating sums of squares. It isnt necessary to understand the computation of the different forms of sums of squares, but it is important that you know the uses of some of the different types. By default, SPSS uses Type III sums of squares, which have the advantage that they are invariant to the cell frequencies. As such, they can be used with both balanced and unbalanced (i.e. different numbers of participants in different groups) designs, which is why they are the default option. Type IV sums of squares are like Type III except that they can be used with data in which there are missing values. So, if you have any missing data in your design, you should change the sums of squares to Type IV. To customize a model, click on to activate the dialog box. The variables specified in

the main dialog box will be listed on the left-hand side. You can select one, or several, variables from this list and transfer them to the box labelled Model as either main effects or interactions. By default, SPSS transfers variables as interaction terms, but there are several options that allow you to enter main effects, or all two-way, three-way or four-way interactions. These options save you the trouble of having to select lots of combinations of variables (because, for example, you can select three variables, transfer them as all twoway interactions and it will create all three combinations of variables for you). Hence, you could select Gender and Alcohol (you can select both of them at the same time by

198

holding down Ctrl). Then, click on the dropdown menu and change it to selected this, click on

. Having

to move the main effects of Gender and Alcohol to the box

labelled Model. Next we could specify the interaction term. To do this, select Gender and Alcohol simultaneously (by holding down the Ctrl key while you click on the two variables), then select in the dropdown list and click on . This action

moves the interaction of Gender and Alcohol to the box labelled Model. The finished dialog box should look like that below. Having specified our two main effects and the interaction term, click on click on to return to the main dialog box and then

to run the analysis. Although model selection has important uses it is likely

that youd want to run the full factorial analysis on most occasions and so wouldnt customize your model.

Oliver Twisted: Please Sir, Can I Have Some More Contrasts?

199

I dont want to use standard contrasts, sulks Oliver as he stamps his feet on the floor, they smell of rotting cabbage. I think actually, Oliver, the stench of rotting cabbage is probably because you stood your Dickensian self under a window when someone emptied their toilet bucket into the street. Nevertheless, I do get asked a fair bit about how to do contrasts with syntax and Im a complete masochist so Ive prepared a fairly detailed guide in the additional material for this chapter. If you want to know more then have a look at this additional material.

Defining Contrasts with Syntax

Why Do We Need To Use Syntax?


In Chapters 12, 13 and 14 of the book we used SPSSs builtin contrast functions to compare various groups after conducting ANOVA. These special contrasts (described in Chapter 10, Table 10.6) cover many situations, but in more complex designs there will be times when you want to do contrasts that simply cant be done using SPSSs built in contrasts. Unlike one-way ANOVA, there is no way in factorial designs to define contrast codes through the Windows dialog boxes. However, SPSS can do these contrasts if you define them using syntax.

An Example
Imagine a clinical psychologist wanted to see the effects of a new antidepressant drug called Cheerup. He took 50 people suffering from clinical depression and randomly assigned them to one of five groups. The first group was a waiting list control group (i.e.

200

they were people assigned to the waiting list who were not treated during the study), the second took a placebo tablet (i.e. they were told they were being given an antidepressant drug but actually the pills contained sugar and no active agents), the third group took a well-established SSRI antidepressant called Seroxat (Paxil to American readers), the fourth group was given a well-established SNRI antidepressant called Effexor,3 the final group was given the new drug, Cheerup. Levels of depression were measured before and after two months on the various treatments, and ranged from 0 = as happy as a spring lamb to 20 = pass me the noose. The data are in the file Depression.sav. The design of this study is a two-way mixed design. There are two independent variables: treatment (no treatment, placebo, Seroxat, Effexor or Cheerup) and time (before or after treatment). Treatment is measured with different participants (and so is between-group) and time is, obviously, measured using the same participants (and so is repeatedmeasures). Hence, the ANOVA we want to use is a 5 2 two-way ANOVA. Now, we want to do some contrasts. Imagine we have the following hypotheses: 1. Any treatment will be better than no treatment. 2. Drug treatments will be better than the placebo. 3. Our new drug, Cheerup, will be better than old-style antidepressants. 4. The old-style antidepressants will not differ in their effectiveness. We have to code these various hypotheses as we did in Chapter 10. The first contrast involves comparing the notreatment condition to all other groups. Therefore, the first step is to chunk these variables, and then assign a positive weight to one chunk and a negative weight to the other chunk.

201

Chunk 1: No Treatment

Chunk 2: Placebo Seroxat Effexor Cheerup

Sign of Weight

Having done that, we need to assign a numeric value to the groups in each chunk. As I mentioned in Chapter 8, the easiest way to do this is just to assign a value equal to the number of groups in the opposite chunk. Therefore, the value for any group in chunk 1 will be the same as the number of groups in chunk 2 (in this case 4). Likewise, the value for any groups in chunk 2 will be the same as the number of groups in chunk 1 (in this case 1). So, we get the following codes:

Chunk 1: No Treatment

Chunk 2: Placebo Seroxat Effexor Cheerup

+ +1

Sign of Weight Value of Weight

The second contrast requires us to compare the placebo group to all of the drug groups. Again, we chunk our groups accordingly, assign one chunk a negative sign and the other a positive, and then assign a weight on the basis of the number of groups in the opposite chunk. We must also remember to give the notreatment group a weight of 0 because theyre not involved in the contrast.

202

Chunk 1: Placebo

Chunk 2: Seroxat Effexor Cheerup

Not Included No Treatment

+ +1

Sign of Weight Value of Weight 0

The third contrast requires us to compare the new drug (Cheerup) to the old drugs (Seroxat and Effexor). Again, we chunk our groups accordingly, assign one chunk a negative sign and the other a positive, and then assign a weight on the basis of the number of groups in the opposite chunk. We must also remember to give the notreatment and placebo groups a weight of 0 because theyre not involved in the contrast.

Chunk 1: Cheerup

Chunk 2: Seroxat Effexor

Not Included No Treatment Placebo Sign of Weight Value of Weight 0

+ +1

The final contrast requires us to compare the two old drugs. Again, we chunk our groups accordingly, assign one chunk a negative sign and the other a positive, and then assign a weight on the basis of the number of groups in the opposite chunk. We must also give the no-treatment, placebo and Cheerup groups a weight of 0 because theyre not involved in the contrast.

203

Chunk 1: Effexor

Chunk 2: Seroxat

Not Included No Treatment Placebo Cheerup

+ +1

Sign of Weight Value of Weight 0

We can summarize these codes in the following table:

No Treatment Contrast 1 Contrast 2 Contrast 3 Contrast 4 -4 0 0 0

Placebo

Seroxat

Effexor

Cheerup

1 -3 0 0

1 1 1 1

1 1 1 -1

1 1 -2 0

These are the codes that we need to enter into SPSS to do the contrasts that wed like to do.

Entering the Contrasts Using Syntax


To enter these contrasts using syntax we have to first open a syntax window (see Chapter 2 of the book). Having done that we have to type the following commands:

MANOVA before after BY treat(0 4)

204

This initializes the ANOVA command in SPSS. The second line specifies the variables in the data editor. The first two words before and after are the repeated-measures variables (and these words are the words used in the data editor). Anything after BY is a between-group measure and so needs to be followed by brackets within which the minimum and maximum values of the coding variable are specified. I called the between-group variable treat, and I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup. Therefore, the minimum and maximum codes were 0 and 4. So these two lines tell SPSS to start the ANOVA procedure, that there are two repeated-measures variables called before and after, and that there is a between-group variable called treat that has a minimum code of 0 and a maximum of 4.

/WSFACTORS time (2)

The /WSFACTORS command allows us to specify any repeated-measures variables. SPSS already knows that there are two variables called before and after, but it doesnt know how to treat these variables. This command tells SPSS to create a repeated-measures variable called time that has two levels (the number in brackets). SPSS then looks to the variables specified before and assigns the first one (before in this case) to be the first level of time, and then assigns the second one (in this case after) to be the second level of time. /CONTRAST (time)=special(1 1, 1 -1)

This is used to specify the contrasts for the first variable. The /CONTRAST is used to specify any contrast. Its always followed by the name of the variable that you want to do a contrast on in brackets. We have two variables (time and treat) and in this first contrast we

205

want to specify a contrast for time. Time only has two levels and so all we want to do is to tell SPSS to compare these two levels (which actually it will do by default but I want you to get some practice in!). What we write after the equals sign defines the contrast, so we could write the name of one of the standard contrasts such as Helmert, but because we want to specify our own contrast we use the word special. Special should always be followed by brackets, and inside those brackets are your contrast codes. Codes for different contrasts are separated using a comma, and within a contrast, codes for different groups are separated using a space. The first contrast should always be one that defines a baseline for all other contrasts and that is one that codes all groups with a 1. Therefore, because we have two levels of time, we just write 1 1, which tells SPSS that the first contrast should be one in which both before and after are given a code of 1. The comma tells SPSS that a new contrast follows and this second contrast has been defined as 1 -1 and this tells SPSS that in this second contrast we want to give before a code of 1, and after a code of -1. Note that the codes you write in the brackets are assigned to variables in the order that those variables are entered into the SPSS syntax, so because we originally wrote before after BY treat(0 4) SPSS assigns the 1 to before and -1 to after; if wed originally wrote after before BY treat(0 4) then SPSS would have assigned them the opposite way round: the 1 to after and -1 to before.

/CONTRAST (treat)=special (1 1 1 1 1, -4 1 1 1 1, 0 -3 1 1 1, 0 0 1 1 -2, 0 0 1 -1 0)

This is used to specify the contrasts for the second variable. This time the /CONTRAST command is followed by the name of the second variable (treat) variable. Treat has five

206

levels and weve already worked out four different contrasts that we want to do. Again we use the word special after the equals sign and specify our coding values within the brackets. As before, codes for different contrasts are separated using a comma and, within a contrast, codes for different groups are separated using a space. Also, as before, the first contrast should always be one that defines a baseline for all other contrasts and that is one that codes all groups with a 1. Therefore, because we have five levels of time, we just write 1 1 1 1 1, which tells SPSS that the first contrast should be one in which all five groups are given a code of 1. The comma tells SPSS that a new contrast follows and this second contrast has been defined as -4 1 1 1 1, and this tells SPSS that in this second contrast we want to give the first group a code of -4 and all subsequent groups codes of 1. How does SPSS decide what the first group is? It uses the coding variable in the data editor and orders the groups in the same order as the coding variable. Therefore, because I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup, this first contrast gives the notreatment group a code of -4 and all subsequent groups codes of 1. The comma again tells SPSS that, having done this, there is another contrast to follow and this contrast has been defined as 0 -3 1 1 1, and this tells SPSS that in this contrast we want to give the first group a code of 0, the second group a code of -3 and all subsequent groups codes of 1. Again, because I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup, this contrast gives the no treatment group a code of 0, the placebo group a code of -3 and all subsequent groups codes of 1. The comma again tells SPSS that having done this, there is another contrast to follow and this contrast has been defined as 0 0 1 1 -2, and this tells SPSS that in this contrast we want to give the first two groups a code of 0, the third and fourth groups a code of 1 and the final group a code of -2. Again, because I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup, this contrast gives the notreatment and placebo groups a code of 0, the Effexor and Seroxat groups a code of -3 and the Cheerup group a code of 1. The comma again tells SPSS that there is yet another

207

contrast to follow and this contrast has been defined as 0 0 1 -1 0, and this tells SPSS that in this contrast we want to give the first, second and last groups a code of 0, the third group a code of 1 and the fourth group a code of -1. This contrast gives the no-treatment, placebo and Cheerup groups a code of 0, the Seroxat group a code of 1 and the Effexor group a code of -1. As such, this one line of text has defined the four contrasts that we want to do.

/CINTERVAL JOINT(.95) MULTIVARIATE(BONFER)

This line defines the type of confidence intervals that you want to do for your contrasts. I recommend the Bonferroni option, but if you delve into the SPSS syntax guide you can find others.

/METHOD UNIQUE /ERROR WITHIN+RESIDUAL /PRINT TRANSFORM HOMOGENEITY(BARTLETT COCHRAN BOXM) SIGNIF( UNIV MULT AVERF HF GG ) PARAM( ESTIM EFSIZE).

These lines of syntax specify various things (that may or may not be useful) such as a transformation matrix (TRANSFORM), which isnt at all necessary here but is useful if

208

youve used SPSSs built-in contrasts, Homogeneity tests (HOMOGENEITY(BARTLETT COCHRAN BOXM)), the main ANOVA and HuynhFeldt and GreenhouseGeisser corrections which we dont actually need in this example (SIGNIF( UNIV MULT AVERF HF GG )) and parameter estimates and effect size estimates for the contrasts weve specified (PARAM( ESTIM EFSIZE)). So, the whole syntax will look like this:

MANOVA before after BY treat(0 4) /WSFACTORS time (2) /CONTRAST (time)=special(1 1, 1 -1) /CONTRAST (treat)=special (1 1 1 1 1, -4 1 1 1 1, 0 -3 1 1 1, 0 0 1 1 -2, 0 0 1 -1 0) /CINTERVAL JOINT(.95) MULTIVARIATE(BONFER) /METHOD UNIQUE /ERROR WITHIN+RESIDUAL /PRINT TRANSFORM HOMOGENEITY(BARTLETT COCHRAN BOXM) SIGNIF( UNIV MULT AVERF HF GG ) PARAM( ESTIM EFSIZE).

Its very important to remember the full stop at the end! This syntax is in the file DepressionSyntax.sps as well, in case your typing goes wrong!
Error Bars show 95.0% Cl of Mean

Output From The Contrasts


Mean Depression Levels
15.00

10.00

209
5.00

0.00 No T rea tmen t Placebo S eroxat (Pa xil) E ffexor Che eru p

Treatment

The output you get is in the form of text (no nice pretty tables) and to interpret it you have to remember the contrasts you specified! Ill run you through the main highlights of this example: The first bit of the output will show the homogeneity tests (which should all be nonsignificant but beware of Boxs test because it tends to be inaccurate). The first important part is the main effect of the variable treat. First theres an ANOVA summary table like those youve come across before (if youve read Chapters 811). This tells us that theres no significant main effect of the type of treatment, F(4, 45) = 2.01, p > .05. This tells us that if you ignore the time at which depression was measured then the levels of depression were about the same across the treatment groups. Of course, levels of depression should be the same before treatment, and so this isnt a surprising result (because it averages across scores before and after treatment. The graph shows that in fact levels of depression are relatively similar across groups.

******Analysis

of

V a r i a n c e -- design

1******

Tests of Between-Subjects Effects.

Tests of Significance for T1 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F

WITHIN+RESIDUAL TREAT

359.95 64.30 4

45 16.08

8.00 2.01 .109

210

------------------------------------Estimates for T1 --- Joint univariate .9500 BONFERRONI confidence intervals

TREAT

Parameter

Coeff. Std. Err.

t-Value

Sig. t Lower -95% CL- Upper

2 3 4 5

-7.7781746 3.53553391 3.74766594 -.21213203

3.99972 3.09817 2.19074 1.26482

-1.94468 1.14117 1.71069 -.16772

.05808 -18.18578 .25984 .09403 .86756 -4.52617 -1.95282 -3.50331

2.62944 11.59723 9.44815 3.07904

Error Bars show 95.0% Cl of Mean

Parameter

ETA Sq.
Depression Levels

20.0 0

15.0 0

2 3 4 5

.07752 .02813 .06106 .00062

10.0 0

5.00

0.00 B efo re T reatme nt A fte r Treatm ent

Time

-------------------------------------

211

This main effect is followed by some contrasts, but we dont need to look at these because the main effect was non-significant. However, just to tell you what they are, parameter 2 is our first contrast (no treatment vs. the rest) and as you can see this is almost significant (p is just above 0.05). Parameter 3 is our second contrast (placebo vs. the rest) and this is non-significant. Parameter 4 is our third contrast (Cheerup vs. Effexor and Seroxat), and again this is almost significant. Parameter 5 is our last contrast (Seroxat vs. Effexor) and this is very non-significant. However, these contrasts all ignore the effect of time and so arent really what were interested in. The next part that were interested in is the within-subject effects, and this involves the main effect of time and the interaction of time and treatment. First theres an ANOVA summary table as before. This tells us that theres a significant main effect of the time, F(1, 45) = 43.02, p < .001. This tells us that if you ignore the type of treatment, there was a significant difference between depression levels before and after treatment. A quick look at the means reveals that depression levels were significantly lower after treatment. Below the ANOVA table is a parameter estimate for the effect of time. As there are only two levels of time, this represents the difference in depression levels before and after treatment. No other contrasts are possible.

******Analysis

of

V a r i a n c e -- design

1******

Tests involving 'TIME' Within-Subject Effect.

212

Tests of Significance for T2 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F

WITHIN+RESIDUAL TIME TREAT BY TIME 306.25

320.35 1

45 306.25 4

7.12 43.02 4.42 .000 .004

125.90

31.47

------------------------------------Estimates for T2 --- Joint univariate .9500 BONFERRONI confidence intervals

TIME

Parameter

Coeff. Std. Err.

t-Value

Sig. t Lower -95% CL- Upper

2.47487373

.37733

6.55891

.00000

1.71489

3.23485

Parameter

ETA Sq.

.48875

213

TREAT BY TIME

Parameter

Coeff. Std. Err.

t-Value

Sig. t Lower -95% CL- Upper

2 3 4 5

11.3137085 -.56568542 -5.8689863 .919238816

3.77330 2.92278 2.06672 1.19322

2.99836 -.19354 -2.83976 .77038

.00441 .84740

1.49527 -8.17101

21.13214 7.03964 -.49121 4.02410

.00675 -11.24676 .44510 -2.18562

Parameter

ETA Sq.
Time
Before Treatm ent After Treatment
20.0 0

Error Bars show 95.0% Cl of Mean

2 3 4 5

.16651 .00083 .15197 .01302


0.00 No T reatme nt P lace bo Sero xat (Paxi l) E ffexor Chee rup

Depression levels

15.0 0

10.0 0

5.00

Type of Treatment

-------------------------------------

The interaction term is also significant, F(4, 45) = 4.42, p < .01. This indicates that the change in depression over time is different in some treatments to others. We can make sense of this through an interaction graph, but we can also look at our contrasts. The key contrasts for this whole analysis are the parameter estimates for the interaction term (the

214

bit in the output underneath the heading TREAT BY TIME) because they take into account the effect of time and treatment: Parameter 2 is our first contrast (no treatment vs. the rest) and as you can see this is significant (p is below 0.05). This tells us that the change in depression levels in the notreatment group was significantly different to the average change in all other groups, t = 2.998, p < .01. As you can see in the graph, there is no change in depression in the

notreatment group, but in all other groups there is a fall in depression. Therefore, this contrast reflects the fact that there is no change in the notreatment group, but there is a decrease in depression levels in all other groups. Parameter 3 is our second contrast (placebo vs. Seroxat, Effexor and Cheerup) and this is very non-significant, t = .194, p = .85. This shows that the decrease in depression levels seen in the placebo group is comparable to the average decrease in depression levels seen in the Seroxat, Effexor and Cheerup conditions. In other words, the combined effect of the drugs on depression is no better than a placebo. Parameter 4 is our third contrast (Cheerup vs. Effexor and Seroxat) and again this is highly significant, t = 2.84, p < 0.01. This shows that the decrease in depression levels seen in the Cheerup group is significantly bigger than the decrease seen in the Effexor and Seroxat groups combined. Put another way, Cheerup has a significantly bigger effect than other established antidepressants. Parameter 5 is our last contrast (Seroxat vs. Effexor) and this is very nonsignificant, t = .77, p = 0.45. This tells us that the decrease in depression levels seen in the Seroxat group is comparable to the decrease in depression levels seen in the Effexor group. Put another way, Effexor and Seroxat seem to have similar effects on depression.

215

I hope to have shown in this example how to specify contrasts using syntax and how looking at these contrasts (especially for an interaction term) can be a very useful way to break down an interaction effect.

Oliver Twisted: Please Sir, can I have some more Simple Effects?

I want to impress my friends by doing a simple effects analysis by hand boasts Oliver. You dont really need to know how simple effects analyses are calculated to run them, Oliver, but seeing as you asked it is explained in the additional material available from the companion website.
Another useful thing to follow up interaction effects is to run contrasts for the interaction term. Like simple effects, this can be done only using syntax, and its a fairly involved process. However, if this sounds like something you might want to do then the additional material for this chapter contains an example that Ive prepared that walks you through specifying contrasts across an interaction.

Calculating Simple Effects


A simple main effect (usually called a simple effect) is just the effect of one variable at levels of another variable. In Chapter 12 we had an example in which wed measured the attractiveness of dates after no alcohol, 2 pints and 4 pints in both men and women.

216

Therefore, we have two independent variables: alcohol (none, 2 pints, 4 pints) and gender (male and female). One simple effects analysis we could do would be to look at the effect of gender (i.e. compare male and female scores) at the three levels of alcohol. Lets look how wed do this. Were partitioning the model sum of squares and we saw in Chapter 10 that we calculate model sums of squares using this equation:

SSM = nk x k x grand

For simple effects, we calculate the model sum of squares for the effect of gender at each level of alcohol. So, wed begin with when there was no alcohol, and calculate the model sum of squares. Thus, the grand mean becomes the mean for when there was no alcohol, and the group means are the means for men (when there was no alcohol) and women (when there was no alcohol). So, we group the data by the amount of alcohol drunk. Within each of these three groups, we calculate the overall mean and also the means of the male and female scores separately. These mean scores are all we really need. Pictorially, you can think of the data as displayed pictorially below. We can then apply the same equation for the model sum of squares that we used for the overall model sum of squares, but we use the grand mean of the noalcohol data (63.75) and the means of males (66.875) and females (60.625) within this group:

No Alcohol Female 65 70 60 Male 50 55 80

2 Pints Female 70 65 60 Male 45 60 85

4 Pints Female 55 65 70 Male 30 30 30

217

60 60 55 60 55 60.625

65 70 75 75 65 66.875

70 65 60 60 50 62.50

65 70 70 80 60 66.875

55 55 60 50 50 57.500

55 35 20 45 40 35.625

Mean None = 63.75

Mean 2 Pints = 64.6875

Mean 4 Pints = 46.5625

SSGender(No Alcohol) = nk x k x grand = 156.25

= 8(60.625 63.75)2 + 8(66.875 63.75)2

The degrees of freedom for this effect are calculated the same way as for any model sum of squares; that is, they are one less than the number of conditions being compared (k 1), which in this case when were comparing only two conditions will be 1. The next step is to do the same but for the 2 pint data. Now we use the grand mean of the 2 pint data (64.6875) and the means of males (66.875) and females (62.50) within this group. The equation, however, stays the same:

SSGender(No Alcohol) = nk x k x grand

= 8(62.50 64.6875)2 + 8(66.875 63.6875)2 = 76.56


The degrees of freedom are the same as in the previous simple effect, namely k 1, which is 1 for these data. The next step is to do the same but for the 4 pint data. Now we use the

218

grand mean of the 4 pint data (46.5625) and the means of females (57.500) and males (35.625) within this group. The equation, however, stays the same:

SSGender(No Alcohol) = nk x k x grand = 1914.06

= 8(57.50 46.5625)2 + 8(35.625 46.5625)2

Again, the degrees of freedom are 1 (because weve compared two groups). As with any ANOVA, we need to convert these sums of squares to mean squares by dividing by the degrees of freedom. However, because all of these sums of squares have 1 degree of freedom, the mean squares will be the same as the sum of squares because were dividing by 1. So, the final stage is to calculate an F-ratio for each simple effect. As ever, the Fratio is just the mean squares for the model divided by the residual mean squares. So, you might well ask, what do we use for the residual mean squares? When conducting simple effects we use the residual mean squares for the original ANOVA (the residual mean squares for the entire model). In doing so we are merely partitioning the model sums of squares and so keep control of the Type I error rate. For these data, the residual sum of squares was 83.036 (see section 10.2.6). Therefore, we get:

FGender(No Alcohol) = FGender(2 Prints) = FGender(4 Prints) =

MSGender(No Alcohol) MSR MSGender(2 Prints) MSR MSGender(4 Prints) MSR = =

156.25 = 1.88 83.036

76.56 = 0.92 83.036 1914.06 = 23.05 83.036

We can evaluate these Fvalues in the usual way (they will have 1 and 42 degrees of freedom for these data). However, for the 2 pint data we can be sure there is not a significant effect of gender because the F-ratio is less than 1.

219

Labcoat Lenis Real Research 12. Dont Forget Your Toothbrush?

Davey, G. C. L. et al. (2003). Journal of Behavior Therapy & Experimental Psychiatry, 34, 141 160.

We have all experienced that feeling after we have left the house of wondering whether we locked the door, or if we remembered to close the window, or if we remembered to remove the bodies from the fridge in case the police turn up. This behaviour is normal; however, people with obsessive compulsive disorder (OCD) tend to check things excessively. They might, for example, check whether they have locked the door so often that it takes them an hour to leave their house. It is a very debilitating problem. One theory of this checking behaviour in OCD suggests that it is caused by a combination of the mood you are in (positive or negative) interacting with the rules you use to decide when to stop a task (do you continue until you feel like stopping, or until you have done the task as best as you can?). Davey, Startup, Zara, MacDonald, and Field (2003) tested this hypothesis by inducing a negative, positive or no mood in different people and then asking them to imagine that they were going on holiday and to generate as many things as they could that they should check before they went away. Within each mood group, half of the participants were instructed to generate as many items as they could (known as an As many as can stop rule), whereas the remainder were asked to generate items for as long as they felt like continuing the task (known as a feel like continuing stop rule). The data are in the file Davey2003.sav. Davey et al. hypothesized that people in negative moods, using an as many as can stop

220

rule would generate more items than those using a feel like continuing stop rule. Conversely, people in a positive mood would generate more items when using a feel like continuing stop rule compared to an as many as can stop rule. Finally, in neutral moods, the stop rule used shouldnt affect the number of items generated. Draw an error bar chart of the data and then conduct the appropriate analysis to test Davey et al.s hypotheses. Answers are in the additional material for this website (or look at pages 148149 in the original article). To do an error bar chart for means that are independent (i.e. have come from different groups) we need to double-click on the clustered bar chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Checks from the variable list and drag it into the variable list and drag it into drag it into ; select Mood from

; finally select the Stop_Rule variable and

. This will mean that lines representing males and females will be

displayed in different colours. Select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph.

221

The resulting graph should look like this:

222

To access the main dialog box for a general factorial ANOVA use the file path . First, select the dependent variable Checks from the variables list on the left-hand side of the dialog box and drag it to the space labelled dependent Variable. In the space labelled fixed Factor(s) we need to place any independent variables relevant to the analysis. Select Mood and Stop_Rule in the variables list (these variables can be selected simultaneously by holding down Ctrl while clicking on the variables) and drag them to the fixed Factor(s) box.

223

The resulting output can be interpreted as follows. First, Levenes test is significant indicating a problem with homogeneity of variance. If we compare the largest and smallest variances (smallest = 2.352 = 5.52; largest = 7.862 = 61.78) we find a ratio of 61.78/5.52 = 11. We have six variances, and N 1 = 9, and so the critical value from Hartleys table (which you can find in this document) is 7.80. Our observed value of 11 is bigger than this so we definitely have a problem.

224

The main effect of mood was not significant, F(2, 54) = 0.68, p > .05, indicating that the number of checks (when we ignore the stop rule adopted) was roughly the same regardless of whether the person was in a positive, negative or neutral mood. Similarly, the main effect of mood was not significant, F(1, 54) = 2.09, p > .05, indicating that the number of

225

checks (when we ignore the mood induced) was roughly the same regardless of whether the person used an as many as can or a feel like continuing stop rule. The mood stop rule interaction was significant, F(2, 54) = 6.35, p < .01, indicating that the mood combined with the stop rule to affect checking behaviour. Looking at the graph a negative mood in combination with an as many as can stop rule increased checking as did the combination of a feel like continuing stop rule and a positive mood, just as Davey et al. predicted.

Chapter 13

Self-Test Answers

What is a repeated-measures design?

Repeated-measures is a term used when the same participants participate in all conditions of an experiment.

What does contrast 3 (Level 3 vs. Level 4) compare?

Contrast 3 compares the fish eyeball with the witchetty grub.

Try rerunning these post hoc tests but select the uncorrected values (LSD) in the options dialog box (see section 13.8.5). You should find that the difference between beer and water is now significant (p = .02).
The dialog boxes should look like this:

226

The output from the post hoc tests for drink looks like this:

The difference between beer and water is now significant (p = .02).

Why do you think that this contradiction has occurred?

Its because the contrasts have more power to detect differences than post hoc tests.

Additional Material

227

Oliver Twisted: Please Sir, Can I Have Some More Sphericity?

Balls , says Oliver, are spherical, and I like balls. Maybe Ill like sphericity too if only you could explain it to me in more detail. Be careful what you wish for, Oliver. In my youth I wrote an article called A bluffers guide to sphericity, which I used to cite in this book, roughly on this page. A few people ask me for it, so I thought I might as well reproduce it in the additional material for this chapter.

Below is a reproduction of: Field, A. P. (1998). A bluffers guide to sphericity. Newsletter of the Mathematical, Statistical and Computing Section of the British Psychological Society, 6(1), 1322.
A bluffers guide to sphericity

The use of repeated measures, where the same subjects are tested under a number of conditions, has numerous practical and statistical benefits. For one thing it reduces the error variance caused by between-group individual differences, however, this reduction of error comes at a price because repeated measures designs potentially introduce covariation between experimental conditions (this is because the same people are used in each condition and so there is likely to be some consistency in their behaviour across conditions). In between-group ANOVA we have to assume that the groups we test are independent for the test to be accurate (Scariano & Davenport, 1987, have documented some of the consequences of violating this

228

assumption). As such, the relationship between treatments in a repeated measures design creates problems with the accuracy of the test statistic. The purpose of this article is to explain, as simply as possible, the issues that arise in analysing repeated measures data with ANOVA: specifically, what is sphericity and why is it important?

What is Sphericity?

Most of us are taught during our degrees that it is crucial to have homogeneity of variance between conditions when analysing data from different subjects, but often we are left to assume that this problem goes away in repeated measures designs. This is not so, and the assumption of sphericity can be likened to the assumption of homogeneity of variance in between-group ANOVA. Sphericity (denoted by and sometimes referred to as circularity) is a more general condition of compound symmetry. Imagine you had a population covariance matrix , where:

2 s11 12 2 21 s 22 = 31 32 ... ... n1 n 2

13 ... 1n 23 ... 2 n 2 ... 3 n s 33


...

n3

... ...

... 2 s nn

Equation 1

This matrix represents two things: (1) the off-diagonal elements represent the covariances between the treatments 1 ... n (you can think of this as the unstandardised correlation between
229

each of the repeated measures conditions); and (2) the diagonal elements signify the variances within each treatment. As such, the assumption of homogeneity of variance between treatments will hold when:
2 2 2 2 s11 s 22 s 33 ... s nn

Equation 2

(i.e. when the diagonal components of the matrix are approximately equal). This is comparable to the situation we would expect in a between-group design. However, in repeated measures designs there is the added complication that the experimental conditions covary with each other. The end result is that we have to consider the effect of these covariances when we analyse the data, and specifically we need to assume that all of the covariances are approximately equal (i.e. all of the conditions are related to each other to the same degree and so the effect of participating in one treatment level after another is also equal). Compound symmetry holds when there is a pattern of constant variances along the diagonal (i.e. homogeneity of variance see Equation 2) and constant covariances off of the diagonal (i.e. the covariances between treatments are equalsee Equation 3). While compound symmetry has been shown to be a sufficient condition for conduction ANOVA on repeated measures data, it is not a necessary condition.

12 13 23 ... 1n 2 n 3n ...
Equation 3

Sphericity is a less restrictive form of compound symmetry (in fact much of the early research into repeated measures ANOVA confused compound symmetry with sphericity). Sphericity
230

refers to the equality of variances of the differences between treatment levels. Whereas compound symmetry concerns the covariation between treatments, sphericity is related to the variance of the differences between treatments. So, if you were to take each pair of treatment levels, and calculate the differences between each pair of scores, then it is necessary that these differences have equal variances. Imagine a situation where there are 4 levels of a repeated measures treatment (A, B, C, D). For sphericity to hold, one condition must be satisfied:
2 2 2 2 2 2 sA B s A C s A D s B C s B D s C D

Equation 4

Sphericity is violated when the condition in Equation 4 is not met (i.e. the differences between pairs of conditions have unequal variances).

How is Sphericity Measured?

The simplest way to see whether or not the assumption of sphericity has been met is to calculate the differences between pairs of scores in all combinations of the treatment levels. Once this has been done, you can simply calculate the variance of these differences. E.g. Table 1 shows data from an experiment with 3 conditions (for simplicity there are only 5 scores per condition). The differences between pairs of conditions can then be calculated for each subject. The variance for each set of differences can then be calculated. We saw above that sphericity is met when these variances are roughly equal. For this data, sphericity will hold when:
2 2 2 sA B s A C s B C
Where:

231

2 sA B = 15.7 2 sA C = 10.3 2 sB C = 10.3

As such,
2 2 2 sA B s AC = s B C

Condition A 10 15 25 35 30

Condition B 12 15 30 30 27

Condition C 8 12 20 28 20 Variance:

A-B -2 0 -5 5 3 15.7

A-C 2 3 5 7 10 10.3

B-C 5 3 10 2 7 10.3

Table 1: Hypothetical data to illustrate the calculation of the variance of the differences

between conditions. So there is at least some deviation from sphericity because the variance of the differences between conditions A and B is greater than the variance of the differences between conditions A and C, and between B and C. However, we can say that this data has local circularity (or local sphericity) because two of the variances are identical). This means that for any multiple comparisons involving these differences, the sphericity assumption has been met (for a discussion of local circularity see Rouanet and Lpine, 1970). The deviation from sphericity in

232

the data in Table 1 does not seem too severe (all variances are roughly equal). This raises the issue of how we assess whether violations from sphericity are severe enough to warrant action.

Assessing the Severity of Departures from Sphericity

Luckily the advancement of computer packages makes it needless to ponder the details of how to assess departures from sphericity. SPSS produces a test known as Mauchlys test, which tests the hypothesis that the variances of the differences between conditions are equal. Therefore, if Mauchlys test statistic is significant (i.e. has a probability value less than 0.05) we must conclude that there are significant differences between the variance of differences, ergo the condition of sphericity has not been met. If, however, Mauchlys test statistic is nonsignificant (i.e. p > .05) then it is reasonable to conclude that the variances of differences are not significantly different (i.e. they are roughly equal). So, in short, if Mauchlys test is significant then we must be wary of the F-ratios produced by the computer.

Figure 1: Output of Mauchlys test from SPSS version 7.0

233

Figure 1 shows the result of Mauchlys test on some fictitious data with three conditions (A, B and C). The result of the test is highly significant indicating that the variance between the differences were significantly different. The table also displays the degrees of freedom (the df are simply N 1 , where N is the number of variances compared) and three estimates of sphericity (see section on correcting for sphericity).

What is the Effect of Violating the Assumption of Sphericity?

Rouanet and Lpine (1970) provided a detailed account of the validity of the F-ratio when the sphericity assumption does not hold. They argued that there are two different F-ratios that can be used to assess treatment comparisons. The two types of F-ratio were labelled F and F respectively. F refers to an F-ratio derived from the mean squares of the comparison in question and the interaction of the subjects with that comparison (i.e. the specific error term for each comparison is used this is the F-ratio normally used). F is derived not from the specific error mean square but from the total error mean squares for all of the repeated measures comparisons. Rouanet and Lpine (1970) argued that F is less powerful than F and so it may be the case that this test statistic misses genuine effects. In addition, they showed that for F to be valid the covariation matrix, , must obey local circularity (i.e. sphericity must hold for the specific comparison in question) and Mendoza, Toothaker & Crain (1976) have supported this by demonstrating that the F ratios of an L J K factorial design with two repeated measures are valid only if local circularity holds. F" requires only overall circularity (i.e. the whole data set must be circular) but because of the non-reciprocal nature of circularity and compound symmetry, F does not require compound symmetry whilst F' does. So, given that F is the statistic generally used, the effect of violating sphericity is a loss of power (compared to when
234

F is used) and an test statistic (F-ratio) which simply cannot be validity compared to tabulated values of the F-distribution.

Correcting for Violations of Sphericity

If data violates the sphericity assumption there are a number of corrections that can be applied to produce a valid F-ratio. SPSS produces three corrections based upon the estimates of sphericity advocated by Greenhouse and Geisser (1959) and Huynh and Feldt (1976). Both of these estimates give rise to a correction factor that is applied to the degrees of freedom used to asses the observed value of F. How each estimate is calculated is beyond the scope of this article, for our purposes all we need know is that each estimate differs slightly from the others. ) varies between (where k is the number The Greenhouse-Geisser estimate (usually denoted as is to 1.00, the more homogeneous are of repeated measures conditions) and 1. The closer that the variances of differences, and hence the closer the data are to being spherical. Figure 1 shows is 0.5, it is clear that the a situation with three conditions and hence the lower limit of is 0.503 which is very close to 0.5 and so represents a substantial calculated value of > 0.75 too many false deviation from sphericity. Huynh and Feldt (1976) reported that when null hypotheses fail to be rejected (i.e. the test is too conservative) and Collier, Baker, as high as 0.90. Huynh and Mandeville & Hayes (1967) showed that this was also true with

~ ). to make it less conservative (usually denoted as Feldt, therefore, proposed a correction to ~a ctually overestimates sphericity. Stevens However, Maxwell and Delaney (1990) report that
(1992) therefore recommends taking an average of the two and adjusting the df by this

>0.75 then the df should be corrected averaged value. Girden (1992) recommends that when

235

~ . If ~< s using 0.75, or nothing is known about sphericity at all, then the conservative hould
be used to adjust the df.

Figure 2: Output of epsilon corrected F values from SPSS version 7.0.

Figure 2 shows a typical ANOVA table for a set of data that violated sphericity (the same data used to generate Figure 1). The table in Figure 2 shows the F ratio and associated degrees of freedom when sphericity is assumed, as can be seen, this results in a significant F statistic indicating some difference(s) between the means of the three conditions. Underneath are the corrected values (for each of the three estimates of sphericity). Notice that in all cases the F ratios remain the same, it is the degrees of freedom that change (and hence the critical value of F). The degrees of freedom are corrected by the estimate of sphericity. How this is done can be seen in Table 2. The new degrees of freedom are then used to ascertain the critical value of F. For this data this results in the observed F being nonsignificant at p < 0.05. This particular data set illustrates how important it is to use a valid critical value of F, it can mean the difference between a statistically significant result and a nonsignificant result. More importantly, it can mean the difference between making a Type I error and not.

236

Estimate of Sphericity Used None

Value of Estimate

Term

df

Correction

New df

Effect Error 0.503 Effect Error 0.506 Effect Error

2 8 2 8 2 8

0.503 2 0.503 8 0.506 2 0.506 8

1.006 4.024 1.012 4.048

Table 2: Shows how the sphericity corrections are applied to the degrees of freedom.

Multivariate vs. Univariate Tests

A final option, when you have data that violates sphericity, is to use multivariate test statistics (MANOVA) because they are not dependent upon the assumption of sphericity (see OBrien & Kaiser, 1985). There is a trade off of test power between univariate and multivariate approaches although some authors argue that this can be overcome with suitable mastery of the techniques (OBrien and Kaisser, 1985). MANOVA avoids the assumption of sphericity (and all the corresponding considerations about appropriate F ratios and corrections) by using a specific error term for contrasts with 1 df and hence, each contrast is only ever associated with its specific error term (rather than the pooled error terms used in ANOVA). Davidson (1972) compared the power of adjusted univariate techniques with those of Hotellings T2 (a MANOVA test statistic) and found that the univariate technique was relatively powerless to detect small

237

reliable changes between highly correlated conditions when other less correlated conditions were also present. Mendoza, Toothaker and Nicewander (1974) conducted a Monte Carlo study comparing univariate and multivariate techniques under violations of compound symmetry and normality and found that as the degree of violation of compound symmetry increased, the empirical power for the multivariate tests also increased. In contrast, the power for the univariate tests generally decreased (p 174). Maxwell and Delaney (1990) noted that the univariate test is relatively more powerful than the multivariate test as n decreases and proposed that the multivariate approach should probably not be used if n is less than a + 10 (a is the number of levels for repeated measures) (p 602). As a general rule it seems that when you have a large violation of sphericity ( < 0.7) and your sample size is greater than (a + 10) then multivariate procedures are more powerful whilst with small sample sizes or when sphericity holds ( > 0.7) the univariate approach is preferred (Stevens, 1992). It is also worth noting that the power of MANOVA increases and decreases as a function of the correlations between dependent variables (Cole et al, 1994) and so the relationship between treatment conditions must be considered also.

Multiple Comparisons

So far, I have discussed the effects of sphericity on the omnibus ANOVA. As a final flurry some discussion of the effects on multiple comparison procedures is warranted. Boik (1981) provided an estimable account of the effects of nonsphericity on a priori tests in repeated measures designs, and concluded that even very small departures from sphericity produce large biases in the F-test and recommends against using these tests for repeated measures contrasts. When experimental error terms are small, the power to detect relatively strong effects can be as
238

low as .05 (when sphericity = .80). He argues that the situation for a priori comparisons cannot be improved and concludes by recommending a multivariate analogue. Mitzel and Games (1981) found that when sphericity does not hold ( < 1) the pooled error term conventionally employed in pairwise comparisons resulted in nonsignificant differences between two means declared significant (i.e. a lenient Type 1 error rate) or undetected differences (a conservative Type 1 error rate). They therefore recommended the use of separate error terms for each comparison. Maxwell (1980) systematically tested the power and alpha levels for 5 a priori tests under repeated measures conditions. The tests assessed were Tukeys Wholly Significant Difference (WSD) test which uses a pooled error term, Tukeys procedure but with a separate error term with either ( n 1 ) df [labelled SEP1] or

(n 1)(k 1)

df [labelled SEP2],

Bonferronis procedure (BON), and a multivariate approachthe Roy-Bose Simultaneous Confidence Interval (SCI). Maxwell tested these a priori procedures varying the sample size, number of levels of the repeated factor and departure from sphericity. He found that the multivariate approach was always "too conservative for practical use" (p 277) and this was most extreme when n (the number of subjects) is small relative to k (the number of conditions). Tukeys test inflated the alpha rate as the covariance matrix departs from sphericity and even when a separate error term was used (SEP1) alpha was slightly inflated as k increased whilst SEP2 also lead to unacceptably high alpha levels. The Bonferroni method, however, was extremely robust (although slightly conservative) and controlled alpha levels regardless of the manipulation. Therefore, in terms of Type I error rates the Bonferroni method was best. In terms of test power (the Type II error rate) for a small sample (n = 8) WSD was the most powerful under conditions of nonsphericity. This advantage was severely reduced when n = 15. Keselman and Keselman (1988) extended Maxwells work and also investigated unbalanced designs. They
239

too used Tukeys WSD, a modified WSD (with non-pooled error variance), Bonferroni t-statistics, and a multivariate approach, and looked at the same factors as Maxwell (with the addition of unequal samples). They found that when unweighted means were used (with unbalanced designs) none of the four tests could control the Type 1 error rate. When weighted means were used only the multivariate tests could limit alpha rates although Bonferroni t statistics were considerably better than the two Tukey methods. In terms of power they concluded that as the number of repeated treatment levels increases, BON is substantially more powerful than SCI (p 223). So, in terms of these studies, the Bonferroni method seems to be generally the most robust of the univariate techniques, especially in terms of power and control of the Type I error rate.

Conclusion

It is more often the rule than the exception that sphericity is violated in repeated measures designs. For this reason, all repeated measures designs should be exposed to tests of violations of sphericity. If sphericity is violated then the researcher must decide whether a multivariate or univariate analysis is preferred (with due consideration to the trade off between test validity on one hand and power on the other). If univariate methods are chosen then the omnibus ANOVA must be corrected appropriately depending on the level of departure from sphericity. Finally, if pairwise comparisons are required the Bonferroni method should probably be used to control the Type 1 error rate. Finally, to ensure that the group sizes are equal otherwise even the Bonferroni technique is subject to inflations of alpha levels.

240

References

Boik, R. J. (1981). A Priori tests in repeated measures designs: effects of nonsphericity, Psychometrika, 46 (3), 241-255. Cole, D. A., Maxwell, S. E., Arvey, R., & Salas, E. (1994). How he power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables, Psychological Bulletin, 115 (3), 465-474. Davidson, M.L.(1972) Univariate versus multivariate tests in repeated-measures experiments. Psychological Bulletin, 77 446452. Girden, E. R. (1992). ANOVA: Repeated Measures (Sage university paper series on qualitative applications in the social sciences, 84), Newbury Park, CA: Sage. Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95112. Huynh, H., and Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics, 1 (1), 69-82. Keselman, H. J. & Keselman, J. C. (1988). Repeated measures multiple comparison procedures: Effects of violating multisample sphericity in unbalanced designs. Journal of educational Statistics, 13 (3), 215-226. Maxwell, S. E. (1980). Pairwise multiple comparisons in repeated measures designs. Journal of Educational Statistics, 5 (3), 269-287. Maxwell, S. E. & Delaney (1990). Designing experiments and analyzing data. Belmont, CA: Wadsworth.

241

Davidson, M.L. (1972) Unvariate Versus Multivariate tests in repeated-Measures experiments. Psychological Bulletin, 77 446-452.

Mendoza, J. L., Toothaker, L. E. & Nicewander, W. A. (1974). A Monte Carlo comparison of the univariate and multivariate methods for the groups by trials repeated measures design. Multivariate Behavioural Research, 9, 165-177. Mendoza, J. L., Toothaker, L. E. & Crain, B. R. (1976). Necessary and sufficient conditions for F Ratios in the L x J x K Factorial design with two repeated factors. Journal of the American Statistical Association, 71, 992-993. Mitzel, H. C., & Games, P. A. (1981). Circularity and multiple comparisons in repeated measures designs, British Journal of Mathematical and Statistical Psychology, 34, 253-259. O Brien, M. G., & Kaiser, M. K. (1985). MANOVA method for analyzing repeated measures designs: An extensive primer, Psychological Bulletin, 97 (2), 316-333. Rouanet, H. & Lpine, D. (1970). Comparison between treatments in a repeated-measurement design: Anova and multivariate methods. The British Journal of mathematical and Statistical Psychology, 23, 147-163. Scariano, S. M. & Davenport, J. M. (1987). The effects of violations of independence in the oneway ANOVA. The American Statistician, 41 (2), 123129. Stevens, J. (1992). Applied multivariate statistics for the social sciences (2nd edition). Hillsdale, NJ: LEA.
Labcoat Lenis Real Research: Whos Afraid of the Big bad Wolf?

Field, A. P. (2006). Journal of Abnormal Psychology, 115(4), 742752.

242

Im going to let my ego get the better of me and talk about some of my own research. When Im not scaring my students with statistics, I scare small children with Australian marsupials. There is a good reason for doing this, which is to try to discover how children develop fears (which will help us to prevent them). Most of my research looks at the effect of giving children information about animals or situations that are novel to them (rather like a parent, teacher of TVshow would do). In one particular study (Field, 2006), we used three novel animals (the quoll, quokka and cuscus) and the children were told negative things about one of the animals, positive things about another, and given no information about the third (our control). I then asked the children to place their hands in three wooden boxes each of which they believed contained one of the aforementioned animals. My hypothesis was that they would take longer to place their hand in the box containing the animal about which they had heard negative information. The data from this part of the study are in the file Field(2006).sav. Labcoat Leni wants you to carry out a one-way repeated-measures ANOVA on the times taken to place their hand in the three boxes (negative information, positive information, no information). First, draw an error bar graph of the means, then do some normality tests on the data, then do a log transformation on the scores, and do the ANOVA on these log-transformed scores (if you read the paper youll notice that I found that the data were not normal so I log transformed them before doing the ANOVA). Do children take longer to put their hand in a box that they believe contains an animal about which they have been told nasty things?

You really ought to know how to do an error bar graph by now, so all I will say is that it should look something like this:

243

To get the normality tests use the Explore procedure:

244

The resulting KS tests show that the data are very heavily non-normal. If you look at the QQ and PP plots (not reproduced here but theyll be in your output) you will see that the data are very heavily skewed. This will be, in part, because if a child didnt put their hand in the box after 15 seconds we gave them a score of 15 and asked them to move on to the next box (this was for ethical reasons: if a child hadnt put their hand in the box after 15 s we assumed that they did not want to do the task). To log-transform the scores we need to use the compute function:

We need to do this three times (once for each variable). Alternatively we could use the following syntax:

245

COMPUTE LogNegative=ln(bhvneg). COMPUTE LogPositive=ln(bhvpos). COMPUTE LogNoInformation=ln(bhvnone). EXECUTE. To do the ANOVA we have to define a variable called Information_Type and then specify the three logged variables:

You can specify some simple contrasts (comparing everything to the last category (no information) or post hoc tests. I actually did something slightly different because I wanted to get precise Bonferronicorrected confidence intervals for my post hoc comparisons, but if you ask for some post hoc tests you will get the same profile of results that I did.

246

Note first of all that the sphericity test is significant. Therefore, in the paper I reported GreenhouseGeisser corrected degrees of freedom and significance. The main ANOVA shows that the type of information significantly affected how long the children took to place their hands in the boxes. The post hoc tests and the graph tell us that a child took longer to place their hand in the box that they believed contained an animal about which they had heard bad things compared to the boxes that they believed contained animals that they had heard positive information about or no information. There was not a significant difference between the approach times for the positive information and no information boxes. You could report these results as follows:

247

The latencies to approach the boxes were positively skewed (KolmogorovSmirnov zs = 1.89, 2.82, and 3.09 for the threat, positive and no information boxes respectively) and so were transformed using the natural log of the score. The resulting distributions were not significantly different from normal (KolmogorovSmirnov zs = 0.77, 1.04 and 1.17 for the threat, positive and no information boxes respectively). A one-way repeated-measures ANOVA revealed a significant main effect of the type of boxi, F(1.90, 239.52) = 104.69, p < .001. Bonferroni corrected post hoc tests revealed a significant difference between the threat information box and the positive information box, p < .001; the threat information box and the no information box, p < .001; but not the positive information box and the no information box, p > .05.

Chapter 15

Self-Test Answers

Carry out some analyses to test for normality and homogeneity of variance in these data. To get the outputs in the book use the following dialog boxes:

248

Split the file by Drug.

To split the file by drug you need to select

and complete the dialog box as follows:

249

See whether you can enter the data in Table 15.3 into SPSS (you dont need to enter the ranks). Then conduct some exploratory analysis on the data. To get the outputs in the book use the following dialog boxes:

Use the Chart Builder to draw a boxplot of these data

The completed Chart Builder window should look like this:

250

Carry out the three MannWhitney Tests suggested above.

The simplest way to run these tests is to use the MannWhitney dialog box, but each time click on and select a different comparison each time (the three tests we want to do compare each group against the control so they all include group 1 as the first group; all that changes is the value in Group 2, which reflects which group is being compared to the controls.

251

252

Using what you know about inputting data, try to enter these data into SPSS and run some exploratory analyses. To get the outputs in the book use the following dialog boxes:

Carry out the three Wilcoxon tests suggested above.

You can do the Wilcoxon tests by selecting the pairs of variables for each comparison in turn and transferring them across to the box labelled Test Pair(s) List: To run the analysis, select the Wilcoxon test dialog box by selecting

. Once the dialog box is activated, select the first two variables from the list (click on Start with the mouse and then, while holding down the Ctrl key, click on Month1). Transfer this pair to the box labelled Test Pairs by clicking on . Then select

253

Start and Month2 and transfer them by clicking on

. Finally, select Month1 and Month2 and .

transfer them by clicking on

. To run the analysis, return to the main dialog box and click on

Additional Material Oliver Twisted: Please Sir, can I have some more Jonck?

I want to know how the JonckheereTerpstra Test actually works,? complains Oliver. Of course you do, Oliver, sleep is hard to come by these days. I am only too happy to oblige my little syphilitic friend. The additional material for this chapter on the companion website has a complete explanation of the test and how it works. I bet youre glad you asked.

254

Jonckheeres test is based on the simple, but elegant, idea of taking a score in a particular condition and counting how many scores in subsequent conditions are smaller than that score. So, the first step is to order your groups in the way that you expect your medians to change. If we take the soya example from Chapter 13, then we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week. So, we start with the no meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 1 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to decrease.2 So, starting with the first score in the no soya meals group (this score is 0.35), we look at the scores in the next condition (1 soya meal) and count how many are greater than 0.35. It turns out that all 19 of the 20 scores are greater than this value, so we place the value of 19 in the appropriate column and move on to the next score (0.58) and do the same. When weve done this for all of the scores in the no meals group, we go back to the first score (0.35) again, but this time count how many scores are bigger in the next but one condition (the 4 soya meals condition). It turns out that 18 scores are bigger so we register this in the appropriate column and move on to the next score (0.58) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups. At this stage we move on to the next group (the 1 soya meal). Again, we start with the first score (0.33) and count how many scores are bigger than this value in the subsequent group (the 4 meals

In fact, we can order the groups the opposite way around if we want to, so we can start with the group we predict to have the lowest median, and then order them in the order we expect the medians to increase. All this will do is reverse the sign of the resulting z-score, and if youre keen to know why theres a section at the end of this document that shows what happens when we reverse the order of groups!

255

group). In this case there all 20 scores are bigger than 0.33, so we register this in the table and move on to the next score (0.36). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.33) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared with all subsequent categories in this way, we simply add up the counts as I have done Table1. These sums of counts are denoted by Uij.

256

Table 1: Data to show Jonckheeres test for the soya example 7 No Soya Meals 1 Soya Meal 4 Soya Meals Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 1 Meal 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 19 18 18 15 15 15 15 14 10 9 9 4 Meals 20 19 18 18 15 14 14 14 11 11 6 7 4 Meals Meals 18 16 13 13 12 7 7 7 6 4 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 20 20 18 18 18 14 14 13 12 11 11 18 18 16 16 15 7 7 7 7 6 5 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 18 16 13 12 11 9 7 7 6 3 3 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 7 Meals 7 Meals Scores Are Sperm

257

4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32 21.08


Total

8 8 6 4 4 2 2 2 0

5 5 3 3 3 3 3 2 0

0 0 0 0 0 0 0 0 0

4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 18.47

6 4 3 3 3 3 3 1 0

0 0 0 0 0 0 0 0 0

3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18.21

3 2 1 0 0 0 0 0 0

1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 4.11

193 (Uij)

187

104

195

122

111

The test statistic, J, is simply the sum of these counts:


J=

U
i< j

ij

which for these data is simply:

258

J=

U
i< j

ij

= 193 + 187 + 104 + 195 + 122 + 111

= 912

For small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:
N2

J=

j =

4 1 N 2 (2 N + 3 ) 72

2 k

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size:

259

j =
= =

1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae:
z= x x J J 912 1200 = = = 2.476 s j 116.33

This z can then be evaluated using the critical values in the Appendix of the book. This test is always one-tailed because we have predicted a trend to use the test. So were looking at z being above 1.65 (when ignoring the sign) to be significant. In fact, the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z) as they do in this example! Does it Matter how I order My Groups? I have just showed how to use the test when the groups are ordered by of descending medians (i.e. we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week; so we ordered the groups: no soya, 1 meal, 4 meals and 7 meals). Certain books will tell you to order the groups in ascending order (i.e. start with the group that you expect to have the lowest median). For the soya data this would mean arranging the groups in the opposite order to how I did in the Appendix; that is, 7 meals, 4 meals, 1 meal and no meals. The purpose of this section is to show you what happens if we order the groups the opposite way around!

260

The process is similar to that used in the Appendix, only now we start with start with the 7 meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 2 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to increase. So, starting with the first score in the 7 soya meals group (this score is 0.31), we look at the scores in the next condition (4 soya meals) and count how many are greater than 0.31. It turns out that all 20 scores are greater than this value, so we place the value of 20 in the appropriate column and move on to the next score (0.32) and do the same. When weve done this for all of the scores in the 7 meals group, we go back to the first score (0.31) again, but this time count how many scores are bigger in the next but one condition (the 1 soya meal condition). It turns out that all 20 scores are bigger so we register this in the appropriate column and move on to the next score (0.32) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups.
Table 2: Data to show Jonckheeres test for the soya example in Chapter 13

No 7 Soya Meals 4 Soya Meals 1 Soya Meal Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 4 1 No 1 Meal No Meals No Meals Scores Are Sperm

261

Meals 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 20 20 19 19 18 18 18 17 16 15 15 14 14 12 11 11 11 8 7

Meal 20 20 18 18 16 15 15 15 15 15 15 15 15 11 10 9 9 9 9

Meals 20 20 19 19 18 18 18 16 15 15 15 15 15 12 11 11 10 10 10 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18 18 15 15 15 15 13 12 11 9 9 9 9 9 8 7 7 2 2 19 18 16 16 15 15 12 12 12 10 10 10 10 10 9 7 7 2 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 20 19 18 18 18 13 12 12 12 11 11 9 7 7 6 6 4 4 1 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32

262

4.11
Total

18.21

18.47

21.08

289 (Uij)

278

296

204

212

209

At this stage we move on to the next group (the 4 soya meals). Again, we start with the first score (0.40) and count how many scores are bigger than this value in the subsequent group (the 1 meal group). In this case there are 18 scores bigger than 0.40, so we register this in the table and move on to the next score (0.60). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.40) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared to all subsequent categories in this way, we simply add up the counts as I have done in the table. These sums of counts are denoted by Uij. As before, test statistic J is simply the sum of these counts:

J=

U
i< j

ij

which for these data is simply:


J=

U
i< j

ij

= 289 + 278 + 296 + 204 + 212 + 209

= 1488

As I said earlier, for small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:

263

J=

N2

n
4

2 k

j =

1 N 2 (2 N + 3 ) 72

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as we did earlier:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size (again this is the same as earlier):
j =
= = 1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae. The mean and standard deviation are the same as above, but we now have a different test statistic (it is 1491 rather than 912). So, lets see what happens when we plug this new test statistic into the equation:

264

z=

x x J J 1488 1200 = = = 2.476 116.33 j s

Note that the zscore is the same value as when we ordered the groups in descending order, except that it now has a positive value rather than a negative one! This goes to prove what I wrote earlier: the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z)! Earlier we ordered the groups in descending order and so got a negative z, and now we ordered them in ascending order and so got a positive z. Labcoat Leni's Real Research: Having a Quail of a Time? Matthews, N. et al. (2007). Psychological Science, 18(9), 758-762.

We encountered some research in Chapter 2 in which we discovered that you can influence aspects of male quails sperm production through conditioning. The basic idea is that the male is granted access to a female for copulation in a certain chamber (e.g. one that is coloured green) but gains no access to a female in a different context (e.g. a chamber with a tilted floor). The male, therefore, learns that when he is in the green chamber his luck is in, but if the floor is tilted then frustration awaits. For other males the chambers will be reversed (i.e. they get sex only when in the chamber with the tilted floor). The human equivalent (well, sort of) would be if you always managed to pull in the Zap Club but never in the Honey Club. During the test phase, males get to mate in both chambers. The question is: after the males have learnt that they will get a mating opportunity in a certain context, do they produce more sperm or better-quality sperm when mating in that context compared to the control context. (Are you more of a stud in the Zap Club ? Ok, Im going to

265

stop this anaology now.) Mike Domjan and his colleagues predicted that if conditioning evolved because it increases reproductive fitness then males that mated in the context that had previously signalled a mating opportunity would fertilize a significantly greater number of eggs than quails that mated in their control context (Matthews, Domjan, Ramsey, & Crews, 2007). They put this hypothesis to the test in an experiment that is utter genius. After training, they allowed 14 females to copulate with two males (counterbalanced): one male copulated with the female in the chamber that had previously signalled a reproductive opportunity (Signalled), whereas the second male copulated with the same female but in the chamber that had not previously signalled a mating opportunity (Control). Eggs were collected from the females for 10 days after the mating and a genetic analysis was used to determine the father of any fertilized eggs. The data from this study are in the file Mathews et al. (2007).sav. Labcoat Leni wants you to carry out a Wilcoxon signedrank test to see whether more eggs were fertilized by males mating in their signalled context compared to males in their control context.

To

run

the

analysis,

select

the

Wilcoxon

test

dialog

box

by

selecting

. Once the dialog box is activated, select two variables from the list (click on the first variable with the mouse and then, while holding down the Ctrl key, the second). You can also select the variables one at a time and transfer them: for example, you could select Signalled and drag it to the column labelled Variable 1 in the box labelled Test Pairs (or click on ), and then select Control and drag it to the column labelled Variable 2 (or click on

). Each pair appears as a new row in the box labelled Test Pairs. To run the analysis, return to the main dialog box and click on .

266

The first table provides information about the ranked scores. It tells us the number of negative ranks (these are females that produced more eggs fertilized by the male in his signalled chamber than the male in his control chamber) and the number of positive ranks (these are females that produced less eggs fertilized by the male in his signalled chamber than the male in his control chamber). The table shows that for 10 of the 14 quails, the number of eggs fertilized by the male in his signalled chamber was greater than for the male in his control chamber, indicating an adaptive benefit to

267

learning that a chamber signalled reproductive opportunity. There was one tied rank (i.e. one female that produced an equal number of fertilized eggs for both males). The table also shows the average number of negative and positive ranks and the sum of positive and negative ranks. Below the table are footnotes, which tell us to what the positive and negative ranks relate (so provide the same kind of explanation as Ive just madesee, Im not clever, I just read the footnotes!). The test statistic, T, is the lowest value of the two types of ranks, so our test value here is the sum of positive ranks (e.g. 13.50). This value can be converted to a z-score and this is what SPSS does. The second table tells us that the test statistic is based on the positive ranks, that the z-score is 2.30 and that this value is significant at p = .022. Therefore, we should conclude that there were a greater number of fertilized eggs from males mating in their signalled context, z = 2.30, p < .05. In other words, conditioning (as a learning mechanism) provides some adaptive benefit in that it makes it more likely that you will pass on your genes. The authors concluded as follows: Of the 78 eggs laid by the test females, 39 eggs were fertilized. Genetic analysis indicated that 28 of these (72%) were fertilized by the signalled males, and 11 were fertilized by the control males. Ten of the 14 females in the experiment produced more eggs fertilized by the signalled male than by the control male (see Fig. 1; Wilcoxon signed-ranks test, T = 13.5, p < .05). These effects were independent of the order in which the 2 males copulated with the female. Of the 39 fertilized eggs, 20 were sired by the 1st male and 19 were sired by the 2nd male. The present findings show that when 2 males copulated with the same female in succession, the male that received a Pavlovian CS signalling copulatory opportunity fertilized more of the females eggs. Thus, Pavlovian conditioning increased reproductive fitness in the context of sperm competition. (p. 760).

268

Labcoat Lenis Real Research: Eggs-traordinary!

etinkaya, H., & Domjan, M. (2006). Journal of Comparative Psychology, 120(4), 427432.

There seems to be a lot of sperm in this book (not literally I hope)its possible that I have a mild obsession. We saw that male quail fertilized more eggs if they had been be able to predict when a mating opportunity would arise. However, some quail develop fetishes. Really. In the previous example the type of compartment acted as a predictor of an opportunity to mate, but in studies where a terrycloth object acts as a sign that a mate will shortly become available, some quails start to direct their sexuial behaviour towards the terrycloth object. (I may regret this anology but in human terms if you imagine that everytime you were going to have sex with your boyfriend you gave him a green towel a few moments before seducing him, then after enough seductions he would start rubbing his crotch against any green towel he saw. If youve ever wondered why you boyfriend does this, then hopefully this is an enlightening explanation.) In evolutionary terms, this fetishistic behaviour seems counterproductive because sexual behaviour become directed towards something that cannot provide reproductive success. However, perhaps this behaviour serves to prepare the organism for the real mating behaviour. Hakan etinkaya and Mike Domjan conducted a brilliant study in which they sexually conditioned male quail (etinkaya & Domjan, 2006). All quail experienced the terrycloth stimulus and an opportunity to mate, but for some the terrycloth stimulus immediately preceded the mating opportunity (paired group) whereas for others they experienced it 2 hours after the mating opportunity (this was the control group because the terrycloth stimulus did not preedict a mating

269

opportuinity). In the paired group, quail were classified as fetishistic or not depending on whether they engaged in sexual behaviour with the terrycloth object. During a test trial the quails mated with a female and the researcher measured the percentage of eggs fertilized, the time spent near the terrycloth object, the latency to initiate copulation, and copulatory efficiency. If this fetishistic behaviour provides an evolutionary advantage then we would expect the fetishistic quails to fertilize more eggs, initiate copulation faster, and be more efficient in their copulations. The data from this study are in the file etinkaya & Domjan (2006).sav. Labcoat Leni wants you to carry out a KruskalWallis test to see whether fetishist quails produced a higher percentage of fertilized eggs and initiated sex more quickly.

Lets begin by using the Chart Builder (

) to do some boxplots:

270

First, access the main dialog box by selecting

. Once

the dialog box is activated, select the two dependent variables from the list (click on Egg_Percent and, while holding down the Ctrl key, Latency) and drag them to the box labelled Test Variable List (or click on ). Next, select the independent variable (the grouping variable), in this case

Group, and drag it to the box labelled Grouping Variable. When the grouping variable has been

selected the

button becomes active and you should click on it to activate the define range

dialog box. SPSS needs to know the range of numeric codes you assigned to your groups, and there is a space for you to type the minimum and maximum codes. The minimum code we used was 1, and the maximum was 3, so type these numbers into the appropriate spaces. When you have defined the groups, click on dialog box and click on to return to the main dialog box. To run the analyses return to the main .

271

The output should look like this:

For both variables there is a significant effect. So there are differences between the groups but we dont know where these differences lie. To find out we can conduct several MannWhitney tests. To access these select .

272

The output you should get is: Fetishistic vs. Nonfetishistic: Fetishistic vs. Control:

Nonfetishistic vs. Control:

273

The authors reported as follows: KruskalWallis analysis of variance (ANOVA) confirmed that female quail partnered with the different types of male quail produced different percentages of fertilized eggs, 2 (2, N = 59) =11.95, p < .05, 2 = 0.20. Subsequent pairwise comparisons with the MannWhitney U test (with the Bonferroni correction) indicated that fetishistic male quail yielded higher rates of fertilization than both the nonfetishistic male quail (U = 56.00, N1 = 17, N2 = 15, effect size = 8.98, p < .05) and the control male quail (U = 100.00, N1 = 17, N2 = 27, effect size = 12.42, p < .05). However, the nonfetishistic group was not significantly different from the control group (U = 176.50, N1 = 15, N2 = 27, effect size = 2.69, p > .05). (page 249) For the latency data they reported as follows: A KruskalWallis analysis indicated significant group differences, 2 (2, N = 59) = 32.24, p < .05,

2 = 0.56. Pairwise comparisons with the MannWhitney U test (with the Bonferroni correction)
showed that the nonfetishistic males had significantly shorter copulatory latencies than both the fetishistic male quail (U = 0.00, N1 = 17, N2 = 15, effect size = 16.00, p < .05) and the control male quail (U = 12.00, N1 = 15, N2 = 27, effect size = 19.76, p < .05). However, the fetishistic group was not significantly different from the control group (U = 161.00, N1 = 17, N2 = 27, effect size = 6.57, p > .05). (page 430)

274

These results support the authors theory that fetishist behaviour may have evolved because it offers some adaptive function (such as preparing for the real thing).

Chapter 14

Self-Test Answers

What is the difference between a main effect and an interaction?

A main effect is the unique effect of a predictor variable (or independent variable) on an outcome variable. In this context it can be the effect of gender, charisma or looks on their own. So, in the case of gender, the main effect is the difference in the average score from men (irrespective of the type of date they were rating) to that of all women (irrespective of the type of date that they are rating). The main effect of looks would be the mean rating given to all attractive dates (irrespective of their charisma, or whether they were rated by a man or a woman), compared to the average rating given to all averagelooking dates (irrespective of their charisma, or whether they were rated by a man or a woman) and the average rating of all ugly dates (irrespective of their charisma, or whether they were rated by a man or a woman). An interaction on the other hand looks at the combined effect of two or more variables: for example, were the average ratings of attractive, ugly and averagelooking dates different in men and women?

Additional Material

275

Labcoat Lenis Real Research: Keep the Faith(ful)?

Schtzwohl, A. (2008). Personality and Individual Differences, 44, 633644.

People can be jealous. People can be especially jealous when they think that their partner is being unfaithful. An evolutionary view of jealousy suggests that men and women have evolved distinctive types of jealousy because male and female reproductive success is threatened by different types of infidelity. Specifically, a womans sexual infidelity deprives her mate of a reproductive opportunity and in some cases burdens him with years investing in a child that is not his. Conversely, a mans sexual infidelity does not burden his mate with unrelated children, but may divert his resources from his mates progeny. This diversion of resources is signalled by emotional attachment to another female. Consequently, mens jealousy mechanism should have evolved to prevent a mates sexual infidelity, whereas in women it has evolved to prevent emotional infidelity. If this is the case then men and women should divert their attentional resources towards different cues to infidelity: women should be on the lookout for emotional infidelity, whereas men should be watching out for sexual infidelity. Achim Schtzwohl put this theory to the test in a unique study in which men and women saw sentences presented on a computer screen (Schtzwohl, 2008). On each trial, participants saw a target sentence that was always affectively neutral (e.g. The gas station is at the other side of the street). However, the trick was that before each of these targets, a distractor sentence was presented that could also be affectively neutral, or could indicate sexual infidelity (e.g. Your partner suddenly has difficulty becoming sexually aroused when he and you want to have sex) or emotional

276

infidelity (e.g. Your partner doesnt say I love you to you anymore). The idea was that if these distractor sentences grabbed a persons attention then (1) they would remember them, and (2) they would not remember the target sentence that came afterwards (because their attentional resources were still focused on the distractor). These effects should only show up in people currently in a relationship. The outcome was the number of sentences that a participant could remember (out of 6), and the predictors were whether the person had a partner or not (Relationship), whether the trial used a neutral distractor, an emotional infidelity distractor or a sexual infidelity distractor, and whether the sentence was a distractor, or the target following the distractor. They analysed men and womens data seperately. The predictions are that women should remember more emotional infidelity sentences (distractors) but fewer of the targets that followed those sentences (target). For men, the same effect should be found but for sexual infidelity sentences.
The data from this study are in the file Schtzwohl(2008).sav. Labcoat Leni wants you to carry out two three-way mixed ANOVAs (one for men and the other for women) to test these hypotheses. Answers are in the additional material on the companies website (or look at pages 638642 in the original article).

We want to run these analyses on men and women separately; therefore, we could (to be efficient) split the file by the variable Gender (see Chapter 5):

277

To (

run

the

ANOVA

select

the ).

repeated-measures We have two

ANOVA

dialog

box

repeated-measures

variables:

whether the sentence was a distracter or a target (lets call this Sentence_Type) and whether the distracter used on a trial was neutral, indicated sexual infidelity or emotional infidelity (lets call this variable Distracter_Type). The resulting ANOVA will be a 2 (relationship: with partner or not) 2 (sentence type: distracter or target) 3 (distracter type: neutral, emotional infidelity or sexual infidelity) three-way mixed ANOVA with repeated measures on the last two variables. First we must define our two repeated-measures variables:

278

Next we need to define these variables by specifying the columns in the data editor that relate to the different combinations of the type of sentence and the type of trial. As you can see, we specified Sentence_Type first, therefore we have all of the variables relating to distracters specified before those for targets. For each type of sentence there are three different variants depending on whether the distracter used was neutral, emotional or sexual. Note that we have use the same order for both types of sentence (neutral, emotional, sexual) and that we have put neutral distracters as the first category so that we can look at some contrasts (neutral distracters are the control).

279

To do some contrasts select the first category:

and select some simple contrasts comparing everything to

You could also ask for an interaction graph for the three-way interaction:

280

You can set other options as in the book chapter. Lets look at the mens output first. Sphericity tests are fine (all non-significant) so Ive simplified the main ANOVA table to show only the sphericity assumed tests:

281

We could report these effects as follows: A three-way ANOVA with current relationship status as the between-subject factor and mens recall of sentence type (targets vs. distractrs) and distractr type (neutral, emotional infidelity and sexual infidelity) as the within-subjects factors yielded a significant main effect of sentence type, F(1, 37) = 53.97, p < .001, and a significant interaction between current relationship status and distracter content, F(2, 74) = 3.92, p = .024. More important, the three-way interaction was also significant, F(2, 74) = 3.79, p = .027. The remaining main effects and interactions were not significant, Fs < 2, ps > .17.

To pick apart the three-way interaction we can look at the table of contrasts:

This table tells us that the effect of whether or not you are in a relationship and whether you were remembering a distracter or target was similar in trials in which an emotional-infidelity distracter was used compared to when a neutral distracter was used, F(1, 37) < 1, p = .95 (level 2 vs. level 1 in the table). However, as predicted, there is a difference in trials in which a sexual-infidelity distracter was used compared to those in which a neutral distracter was used, F(1, 37) = 5.39, p < .05 (level 3 vs. level 1).

282

To see what these contrasts tell us look at the graphs (Ive edited these a bit so that they are clearer). First off, in those without partners, they remember many more targets than they do distracters, and this is true for all types of trials. In other words, it doesnt matter whether the distracter is neutral, emotional or sexual; these people remember more targets than distracters. The same pattern is seen in those with partners except for distracters that indicate sexual infidelity (the red line). For these, the number of targets remembered is reduced. Put another way, the slope of the green and blue lines is more or less the same for those in and out of relationships (compare graphs) and also to each other (compare green with blue). The only difference is for the red line, which is comparable to the green and blue lines for those not in relationships, but is much shallower for those in relationships. They remember fewer targets that were preceded by a sexual-infidelity distracter. This supports the predictions of the author: men in relationships have an attentional bias such that their attention is consumed by cues indicative of sexual infidelity. Lets now look at the womens output. Sphericity tests are fine (all non-significant) so Ive simplified the main ANOVA table to show only the sphericity assumed tests:

283

We could report these effects as follows: A three-way ANOVA with current relationship status as the between-subject factor and mens recall of sentence type (targets vs. distracters) and distracter type (neutral, emotional infidelity and sexual infidelity) as the within-subject factors yielded a significant main effect of sentence type, F(1, 39) = 39.68, p < .001, and distracter type, F(2, 78) = 4.24, p = .018. Additionally, significant interactions were found between sentence type and distracter type, F(2, 78) = 4.63, p = .013, and most important sentence type distracter type relationship, F(2, 78) = 5.33, p = .007. The remaining main effect and interactions were not significant, Fs < 1.2, ps > .29. To pick apart the three-way interaction we can look at the table of contrasts:

284

This table tells us that the effect of whether or not you are in a relationship and whether you were remembering a distracter or target was significantly different in trials in which a emotional-infidelity distracter was used compared to when a neutral distracter was used, F(1, 39) = 7.56, p = .009 (level 2 vs. level 1 in the table). However, there was not a significant difference in trials in which a sexual-infidelity distracter was used compared to those in which a neutral distracter was used, F(1, 39) = 0.31, p = .58 (level 3 vs. level 1).

To see what these contrasts tell us look at the graphs (Ive edited these a bit so that they are clearer). As for the men, women without partners remember many more targets than they do distracters, and this is true for all types of trials (although its less true for the sexual-infidelity

285

trials because this line has a shallower slope). The same pattern is seen in those with partners except for distracters that indicate emotional infidelity (the green line). For these, the number of targets remembered is reduced. Put another way, the slope of the red and blue lines is more or less the same for those in and out of relationships (compare graphs). The only difference is for the green line, which is much shallower for those in relationships. They remember fewer targets that were preceded by a emotional-infidelity distracter. This supports the predictions of the author: women in relationships have an attentional bias such that their attention is consumed by cues indicative of emotional infidelity.

Chapter 15

Self-Test Answers

Carry out some analyses to test for normality and homogeneity of variance in these data. To get the outputs in the book use the following dialog boxes:

286

Split the file by Drug.

To split the file by drug you need to select

and complete the dialog box as follows:

See whether you can enter the data in Table 15.3 into SPSS (you dont need to enter the ranks). Then conduct some exploratory analysis on the data. To get the outputs in the book use the following dialog boxes:

287

Use the Chart Builder to draw a boxplot of these data

The completed Chart Builder window should look like this:

288

Carry out the three MannWhitney Tests suggested above.

The simplest way to run these tests is to use the MannWhitney dialog box, but each time click on and select a different comparison each time (the three tests we want to do compare each group against the control so they all include group 1 as the first group; all that changes is the value in Group 2, which reflects which group is being compared to the controls.

289

290

Using what you know about inputting data, try to enter these data into SPSS and run some exploratory analyses. To get the outputs in the book use the following dialog boxes:

Carry out the three Wilcoxon tests suggested above.

You can do the Wilcoxon tests by selecting the pairs of variables for each comparison in turn and transferring them across to the box labelled Test Pair(s) List: To run the analysis, select the Wilcoxon test dialog box by selecting

. Once the dialog box is activated, select the first two variables from the list (click on Start with the mouse and then, while holding down the Ctrl key, click on Month1). Transfer this pair to the box labelled Test Pairs by clicking on . Then select

291

Start and Month2 and transfer them by clicking on

. Finally, select Month1 and Month2 and .

transfer them by clicking on

. To run the analysis, return to the main dialog box and click on

Additional Material Oliver Twisted: Please Sir, can I have some more Jonck?

I want to know how the JonckheereTerpstra Test actually works,? complains Oliver. Of course you do, Oliver, sleep is hard to come by these days. I am only too happy to oblige my little syphilitic friend. The additional material for this chapter on the companion website has a complete explanation of the test and how it works. I bet youre glad you asked.

292

Jonckheeres test is based on the simple, but elegant, idea of taking a score in a particular condition and counting how many scores in subsequent conditions are smaller than that score. So, the first step is to order your groups in the way that you expect your medians to change. If we take the soya example from Chapter 13, then we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week. So, we start with the no meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 1 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to decrease.3 So, starting with the first score in the no soya meals group (this score is 0.35), we look at the scores in the next condition (1 soya meal) and count how many are greater than 0.35. It turns out that all 19 of the 20 scores are greater than this value, so we place the value of 19 in the appropriate column and move on to the next score (0.58) and do the same. When weve done this for all of the scores in the no meals group, we go back to the first score (0.35) again, but this time count how many scores are bigger in the next but one condition (the 4 soya meals condition). It turns out that 18 scores are bigger so we register this in the appropriate column and move on to the next score (0.58) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups. At this stage we move on to the next group (the 1 soya meal). Again, we start with the first score (0.33) and count how many scores are bigger than this value in the subsequent group (the 4 meals

In fact, we can order the groups the opposite way around if we want to, so we can start with the group we predict to have the lowest median, and then order them in the order we expect the medians to increase. All this will do is reverse the sign of the resulting z-score, and if youre keen to know why theres a section at the end of this document that shows what happens when we reverse the order of groups!

293

group). In this case there all 20 scores are bigger than 0.33, so we register this in the table and move on to the next score (0.36). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.33) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared with all subsequent categories in this way, we simply add up the counts as I have done Table1. These sums of counts are denoted by Uij.

294

Table 1: Data to show Jonckheeres test for the soya example 7 No Soya Meals 1 Soya Meal 4 Soya Meals Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 1 Meal 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 19 18 18 15 15 15 15 14 10 9 9 4 Meals 20 19 18 18 15 14 14 14 11 11 6 7 4 Meals Meals 18 16 13 13 12 7 7 7 6 4 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 20 20 18 18 18 14 14 13 12 11 11 18 18 16 16 15 7 7 7 7 6 5 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 18 16 13 12 11 9 7 7 6 3 3 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 7 Meals 7 Meals Scores Are Sperm

295

4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32 21.08


Total

8 8 6 4 4 2 2 2 0

5 5 3 3 3 3 3 2 0

0 0 0 0 0 0 0 0 0

4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 18.47

6 4 3 3 3 3 3 1 0

0 0 0 0 0 0 0 0 0

3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18.21

3 2 1 0 0 0 0 0 0

1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 4.11

193 (Uij)

187

104

195

122

111

The test statistic, J, is simply the sum of these counts:


J=

U
i< j

ij

which for these data is simply:

296

J=

U
i< j

ij

= 193 + 187 + 104 + 195 + 122 + 111

= 912

For small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:
N2

J=

j =

4 1 N 2 (2 N + 3 ) 72

2 k

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size:

297

j =
= =

1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae:
z= x x J J 912 1200 = = = 2.476 s j 116.33

This z can then be evaluated using the critical values in the Appendix of the book. This test is always one-tailed because we have predicted a trend to use the test. So were looking at z being above 1.65 (when ignoring the sign) to be significant. In fact, the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z) as they do in this example! Does it Matter how I order My Groups? I have just showed how to use the test when the groups are ordered by of descending medians (i.e. we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week; so we ordered the groups: no soya, 1 meal, 4 meals and 7 meals). Certain books will tell you to order the groups in ascending order (i.e. start with the group that you expect to have the lowest median). For the soya data this would mean arranging the groups in the opposite order to how I did in the Appendix; that is, 7 meals, 4 meals, 1 meal and no meals. The purpose of this section is to show you what happens if we order the groups the opposite way around!

298

The process is similar to that used in the Appendix, only now we start with start with the 7 meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 2 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to increase. So, starting with the first score in the 7 soya meals group (this score is 0.31), we look at the scores in the next condition (4 soya meals) and count how many are greater than 0.31. It turns out that all 20 scores are greater than this value, so we place the value of 20 in the appropriate column and move on to the next score (0.32) and do the same. When weve done this for all of the scores in the 7 meals group, we go back to the first score (0.31) again, but this time count how many scores are bigger in the next but one condition (the 1 soya meal condition). It turns out that all 20 scores are bigger so we register this in the appropriate column and move on to the next score (0.32) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups.
Table 2: Data to show Jonckheeres test for the soya example in Chapter 13

No 7 Soya Meals 4 Soya Meals 1 Soya Meal Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 4 1 No 1 Meal No Meals No Meals Scores Are Sperm

299

Meals 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 20 20 19 19 18 18 18 17 16 15 15 14 14 12 11 11 11 8 7

Meal 20 20 18 18 16 15 15 15 15 15 15 15 15 11 10 9 9 9 9

Meals 20 20 19 19 18 18 18 16 15 15 15 15 15 12 11 11 10 10 10 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18 18 15 15 15 15 13 12 11 9 9 9 9 9 8 7 7 2 2 19 18 16 16 15 15 12 12 12 10 10 10 10 10 9 7 7 2 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 20 19 18 18 18 13 12 12 12 11 11 9 7 7 6 6 4 4 1 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32

300

4.11
Total

18.21

18.47

21.08

289 (Uij)

278

296

204

212

209

At this stage we move on to the next group (the 4 soya meals). Again, we start with the first score (0.40) and count how many scores are bigger than this value in the subsequent group (the 1 meal group). In this case there are 18 scores bigger than 0.40, so we register this in the table and move on to the next score (0.60). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.40) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared to all subsequent categories in this way, we simply add up the counts as I have done in the table. These sums of counts are denoted by Uij. As before, test statistic J is simply the sum of these counts:

J=

U
i< j

ij

which for these data is simply:


J=

U
i< j

ij

= 289 + 278 + 296 + 204 + 212 + 209

= 1488

As I said earlier, for small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:

301

J=

N2

n
4

2 k

j =

1 N 2 (2 N + 3 ) 72

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as we did earlier:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size (again this is the same as earlier):
j =
= = 1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae. The mean and standard deviation are the same as above, but we now have a different test statistic (it is 1491 rather than 912). So, lets see what happens when we plug this new test statistic into the equation:

302

z=

x x J J 1488 1200 = = = 2.476 116.33 j s

Note that the zscore is the same value as when we ordered the groups in descending order, except that it now has a positive value rather than a negative one! This goes to prove what I wrote earlier: the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z)! Earlier we ordered the groups in descending order and so got a negative z, and now we ordered them in ascending order and so got a positive z. Labcoat Leni's Real Research: Having a Quail of a Time? Matthews, N. et al. (2007). Psychological Science, 18(9), 758-762.

We encountered some research in Chapter 2 in which we discovered that you can influence aspects of male quails sperm production through conditioning. The basic idea is that the male is granted access to a female for copulation in a certain chamber (e.g. one that is coloured green) but gains no access to a female in a different context (e.g. a chamber with a tilted floor). The male, therefore, learns that when he is in the green chamber his luck is in, but if the floor is tilted then frustration awaits. For other males the chambers will be reversed (i.e. they get sex only when in the chamber with the tilted floor). The human equivalent (well, sort of) would be if you always managed to pull in the Zap Club but never in the Honey Club. During the test phase, males get to mate in both chambers. The question is: after the males have learnt that they will get a mating opportunity in a certain context, do they produce more sperm or better-quality sperm when mating in that context compared to the control context. (Are you more of a stud in the Zap Club ? Ok, Im going to

303

stop this anaology now.) Mike Domjan and his colleagues predicted that if conditioning evolved because it increases reproductive fitness then males that mated in the context that had previously signalled a mating opportunity would fertilize a significantly greater number of eggs than quails that mated in their control context (Matthews, Domjan, Ramsey, & Crews, 2007). They put this hypothesis to the test in an experiment that is utter genius. After training, they allowed 14 females to copulate with two males (counterbalanced): one male copulated with the female in the chamber that had previously signalled a reproductive opportunity (Signalled), whereas the second male copulated with the same female but in the chamber that had not previously signalled a mating opportunity (Control). Eggs were collected from the females for 10 days after the mating and a genetic analysis was used to determine the father of any fertilized eggs. The data from this study are in the file Mathews et al. (2007).sav. Labcoat Leni wants you to carry out a Wilcoxon signedrank test to see whether more eggs were fertilized by males mating in their signalled context compared to males in their control context.

To

run

the

analysis,

select

the

Wilcoxon

test

dialog

box

by

selecting

. Once the dialog box is activated, select two variables from the list (click on the first variable with the mouse and then, while holding down the Ctrl key, the second). You can also select the variables one at a time and transfer them: for example, you could select Signalled and drag it to the column labelled Variable 1 in the box labelled Test Pairs (or click on ), and then select Control and drag it to the column labelled Variable 2 (or click on

). Each pair appears as a new row in the box labelled Test Pairs. To run the analysis, return to the main dialog box and click on .

304

The first table provides information about the ranked scores. It tells us the number of negative ranks (these are females that produced more eggs fertilized by the male in his signalled chamber than the male in his control chamber) and the number of positive ranks (these are females that produced less eggs fertilized by the male in his signalled chamber than the male in his control chamber). The table shows that for 10 of the 14 quails, the number of eggs fertilized by the male in his signalled chamber was greater than for the male in his control chamber, indicating an adaptive benefit to

305

learning that a chamber signalled reproductive opportunity. There was one tied rank (i.e. one female that produced an equal number of fertilized eggs for both males). The table also shows the average number of negative and positive ranks and the sum of positive and negative ranks. Below the table are footnotes, which tell us to what the positive and negative ranks relate (so provide the same kind of explanation as Ive just madesee, Im not clever, I just read the footnotes!). The test statistic, T, is the lowest value of the two types of ranks, so our test value here is the sum of positive ranks (e.g. 13.50). This value can be converted to a z-score and this is what SPSS does. The second table tells us that the test statistic is based on the positive ranks, that the z-score is 2.30 and that this value is significant at p = .022. Therefore, we should conclude that there were a greater number of fertilized eggs from males mating in their signalled context, z = 2.30, p < .05. In other words, conditioning (as a learning mechanism) provides some adaptive benefit in that it makes it more likely that you will pass on your genes. The authors concluded as follows: Of the 78 eggs laid by the test females, 39 eggs were fertilized. Genetic analysis indicated that 28 of these (72%) were fertilized by the signalled males, and 11 were fertilized by the control males. Ten of the 14 females in the experiment produced more eggs fertilized by the signalled male than by the control male (see Fig. 1; Wilcoxon signed-ranks test, T = 13.5, p < .05). These effects were independent of the order in which the 2 males copulated with the female. Of the 39 fertilized eggs, 20 were sired by the 1st male and 19 were sired by the 2nd male. The present findings show that when 2 males copulated with the same female in succession, the male that received a Pavlovian CS signalling copulatory opportunity fertilized more of the females eggs. Thus, Pavlovian conditioning increased reproductive fitness in the context of sperm competition. (p. 760).

306

Labcoat Lenis Real Research: Eggs-traordinary!

etinkaya, H., & Domjan, M. (2006). Journal of Comparative Psychology, 120(4), 427432.

There seems to be a lot of sperm in this book (not literally I hope)its possible that I have a mild obsession. We saw that male quail fertilized more eggs if they had been be able to predict when a mating opportunity would arise. However, some quail develop fetishes. Really. In the previous example the type of compartment acted as a predictor of an opportunity to mate, but in studies where a terrycloth object acts as a sign that a mate will shortly become available, some quails start to direct their sexuial behaviour towards the terrycloth object. (I may regret this anology but in human terms if you imagine that everytime you were going to have sex with your boyfriend you gave him a green towel a few moments before seducing him, then after enough seductions he would start rubbing his crotch against any green towel he saw. If youve ever wondered why you boyfriend does this, then hopefully this is an enlightening explanation.) In evolutionary terms, this fetishistic behaviour seems counterproductive because sexual behaviour become directed towards something that cannot provide reproductive success. However, perhaps this behaviour serves to prepare the organism for the real mating behaviour. Hakan etinkaya and Mike Domjan conducted a brilliant study in which they sexually conditioned male quail (etinkaya & Domjan, 2006). All quail experienced the terrycloth stimulus and an opportunity to mate, but for some the terrycloth stimulus immediately preceded the mating opportunity (paired group) whereas for others they experienced it 2 hours after the mating opportunity (this was the control group because the terrycloth stimulus did not preedict a mating

307

opportuinity). In the paired group, quail were classified as fetishistic or not depending on whether they engaged in sexual behaviour with the terrycloth object. During a test trial the quails mated with a female and the researcher measured the percentage of eggs fertilized, the time spent near the terrycloth object, the latency to initiate copulation, and copulatory efficiency. If this fetishistic behaviour provides an evolutionary advantage then we would expect the fetishistic quails to fertilize more eggs, initiate copulation faster, and be more efficient in their copulations. The data from this study are in the file etinkaya & Domjan (2006).sav. Labcoat Leni wants you to carry out a KruskalWallis test to see whether fetishist quails produced a higher percentage of fertilized eggs and initiated sex more quickly.

Lets begin by using the Chart Builder (

) to do some boxplots:

308

First, access the main dialog box by selecting

. Once

the dialog box is activated, select the two dependent variables from the list (click on Egg_Percent and, while holding down the Ctrl key, Latency) and drag them to the box labelled Test Variable List (or click on ). Next, select the independent variable (the grouping variable), in this case

Group, and drag it to the box labelled Grouping Variable. When the grouping variable has been

selected the

button becomes active and you should click on it to activate the define range

dialog box. SPSS needs to know the range of numeric codes you assigned to your groups, and there is a space for you to type the minimum and maximum codes. The minimum code we used was 1, and the maximum was 3, so type these numbers into the appropriate spaces. When you have defined the groups, click on dialog box and click on to return to the main dialog box. To run the analyses return to the main .

309

The output should look like this:

For both variables there is a significant effect. So there are differences between the groups but we dont know where these differences lie. To find out we can conduct several MannWhitney tests. To access these select .

310

The output you should get is: Fetishistic vs. Nonfetishistic: Fetishistic vs. Control:

Nonfetishistic vs. Control:

311

The authors reported as follows: KruskalWallis analysis of variance (ANOVA) confirmed that female quail partnered with the different types of male quail produced different percentages of fertilized eggs, 2 (2, N = 59) =11.95, p < .05, 2 = 0.20. Subsequent pairwise comparisons with the MannWhitney U test (with the Bonferroni correction) indicated that fetishistic male quail yielded higher rates of fertilization than both the nonfetishistic male quail (U = 56.00, N1 = 17, N2 = 15, effect size = 8.98, p < .05) and the control male quail (U = 100.00, N1 = 17, N2 = 27, effect size = 12.42, p < .05). However, the nonfetishistic group was not significantly different from the control group (U = 176.50, N1 = 15, N2 = 27, effect size = 2.69, p > .05). (page 249) For the latency data they reported as follows: A KruskalWallis analysis indicated significant group differences, 2 (2, N = 59) = 32.24, p < .05,

2 = 0.56. Pairwise comparisons with the MannWhitney U test (with the Bonferroni correction)
showed that the nonfetishistic males had significantly shorter copulatory latencies than both the fetishistic male quail (U = 0.00, N1 = 17, N2 = 15, effect size = 16.00, p < .05) and the control male quail (U = 12.00, N1 = 15, N2 = 27, effect size = 19.76, p < .05). However, the fetishistic group was not significantly different from the control group (U = 161.00, N1 = 17, N2 = 27, effect size = 6.57, p > .05). (page 430)

312

These results support the authors theory that fetishist behaviour may have evolved because it offers some adaptive function (such as preparing for the real thing).
Chapter 16

Additional Material Oliver Twisted: Please Sir, can I have some more Maths?

You are a bit stupid. I think it would be fun to check your maths so that we can see exactly how much of a village idiot you are, mocks Oliver. Luckily you can. Never one to shy from public humiliation on a mass scale I have provided the matrix calculations for this example on the companion website. Find a mistake, go on, you know that you can

Calculation of E1

313

51 13 E = 13 122 determinant of E ,|E | = (51 122) (13 13) = 6053 122 13 matrix of minors for E = 13 51 + pattern of signs for 2 2 matrix = + 122 13 matrix of cofactors = 13 51 The inverse of a matrix is obtained by dividing the matrix of cofactors for E by E , the determinant of E. 122 6053 E 1 = 13 6053 Calculation of HE1 13 6053 0.0202 0.0021 = 51 0.0021 0.0084 6053

10.47 7.53 0.0202 0.0202 HE 1 = 0.0202 0.0084 7.53 19.47 [(10.47 0.0202) + (7.53 0.0021)] [(10.47 0.0021) + (7.53 0.0084)] = [(7.53 0.0202) + (19.47 0.0021)] [(7.53 0.0021) + (19.47 0.0084)] 0.2273 0.0852 = 0.1930 0.1794 Calculation of Eigenvalues The eigenvalues or roots of any square matrix are the solutions to the determinantal equation |A

I| = 0, in which A is the square matrix in question and I is an identity matrix of the same size as A.
The number of eigenvalues will equal the number of rows (or columns) of the square matrix. In this case the square matrix of interest is HE1.

314

0.2273 0.0852 0 HE 1 I = 0 0.1930 0.1794 0.0852 (0.2273 ) = (0.1794 ) 0.1930 = [(0.2273 )(0.1794 ) (0.1930 0.0852)] = 2 0.2273 0.1794 + 0.0407 0.0164 = 2 0.4067 + 0.0243 Therefore the equation |HE1 I| = 0 can be expressed as:

2 0.4067 + 0.0243 = 0
To solve the roots of any quadratic equation of the general form a2 + b + c = 0 we can apply the following formula:

i =

b 2 4 ac

2a

For the quadratic equation obtained, a = 1, b = 0.4067, c = 0.0243. If we replace these values into the formula for discovering roots, we get:
b

i =
=

(b

4 ac

)
2

2a 0.4067

2 = 0.4067 0.2612 2 . 0 6679 = or 0.1455 2 2 = 0.334 or 0.073

[( 0.4067 )

0.0972

Hence, the eigenvalues are 0.334 and 0.073.

Labcoat Lenis Real Research: A Lot of Hot Air!

315

Marzillier, S. L., & Davey, G. C. L. (2005). Cognition and Emotion, 19, 729750.

Have you ever wondered what researchers do in their spare time? Well, some of them spend it tracking down the sounds of people burping and farting! It has long been established that anxiety and disgust are linked. Anxious people are, typically, easily disgusted. Throughout this book I have talked about how you cannot infer causality from relationships between variables. This has been a bit of a conundrum for anxiety researchers: does anxiety cause feelings of digust or does a low threshold for being disgusted cause anxiety? Two colleagues of mine at Sussex addressed this in an unusual study in which they induced feelings of anxiety, feelings of disgust, or a neutral mood was induced, and they looked at the effect that these induced moods had on feelings of anxiety, sadness, happiness, anger, disgust and contempt. To induce these moods, they used three different types of manipulation: vignettes (e.g. youre swimming in a dark lake and something brushes your leg for anxiety, and you go into a public toilet and find it has not been flushed. The bowl of the toilet is full of diarrhoea for disgust), music (e.g. some scary music for anxiety, and a tape of burps, farts and vomitting for disgust), videos (e.g. a clip from Silence of the lambs for anxiety and a scene from Pink flamingos in which Divine eats dog faeces), and memory (remembering events from the past that had made the person anxious, disgusted or neutral). Different people underwent anxious, disgust and neutral mood inductions. Within these groups, the induction was done using vignettes and music, videos, or memory recall and music for different people. The outcome variables were the change (from before to after the induction) in six moods: anxiety, sadness, happiness, anger, disgust and contempt.

316

The data are in the file Marzillier and Davey (2005).sav. Draw an error bar graph of the changes in moods in the different conditions, then conduct a 3 (Mood: anxiety, disgust, neutral) 3 (Induction: vignettes + music, videos, memory recall + music) MANOVA on these data. Whatever you do, dont imagine what their fart tape sounded like while you do the analysis! Answers are in the additional material on the companion website (or look at page 738 of the original article).

To do the graph we have to access the Chart Builder and select a clustered bar chart. First, lets set Mood induction as the x-axis by selecting it and dragging it to the drop zone:

317

Next, select all of the DVs (click on the Change in Anxiety, then hold Shift down and click on Change in Contempt and all six should become highlighted). Then drag these into the y-axis drop zone. This will have the effect that different moods will be displayed by differentcoloured bars.

We have another variable, the type of induction, and we can display this too. First, click on the Groups/Point ID tab and then select Row Panel. When this is selected a new drop zone appears (called panel), and you can drag the Type of Induction into that zone. Remember to select and the finished dialog box will look as follows. Click on to produce the graph.

318

The completed graph will look like that below. This shows that the neutral mood induction (regardless of the way in which it was induced) didnt really affect mood too much (the changes are all quite small). For the disgust mood induction, disgust always increased quite a lot (the yellow bars) regardless of how disgust was induced. Similarly, the anxiety induction raised anxiety (predominantly). Happiness decreased for both anxiety and disgust mood inductions.

319

To run the MANOVA, the main dialog box should look like this:

320

You can set whatever options you like based on the chapter. The main multivariate statistics are shown below. A main effect of mood was found F(12, 334) = 21.91, p < .001, showing that the changes for some mood inductions were bigger than for others overall (looking at the graph this finding probably reflects that the disgust mood induction had the greatest effect overall mainly because it produced such huge changes in disgust). There was no significant main effect of the type of mood induction F(24, 334) = 1.12, p > .05, showing that whether videos, memory, tapes, etc., were used did not affect the changes in mood. Also, the type of mood type of induction interaction, F(24, 676) = 1.22, p > .05, showed that the type of induction did not influence the main effect of mood. In other words, the fact that the disgust induction seemed to have the biggest effect on mood (overall) was not influenced by how disgust was induced.

The univariate effects for type of mood (which was the only significant multivariate effect) show that the effect of the type of mood induction was significant for all six moods (in other words, for all six moods there were significant differences across the anxiety, disgust and neutral conditions). Below is a graph that collapses across the way that mood was induced (video, music, etc.) because

321

this effect was not significant (you can create this by going back to the Chart Builder and deselecting Rows Panel). We should do more tests, but just looking at the graph shows that changes in anxiety (blue bars) are higher over the three mood conditions (they go up after the anxiety induction, stay the same for the disgust induction, and go down for the neutral induction). Similarly, for disgust, the change is biggest after the disgust induction, it increases a little after the anxiety induction and doesnt really change after the neutral (yellow bars). Finally, for happiness, this goes down after both anxiety and disgust inductions, but doesnt change for neutral.

322

323

Chapter 17

Self-Test Answers

Use the Case Summaries command to list the factor scores for these data (given that there are over 2500 cases, you might like to restrict the output to the first 10 or 20).

To list the factor scores you need to use the Case Summaries command, which can be found by selecting . Simply select the variables that you want to

list (in this case the four columns of factor scores) and transfer them to the box labelled Variables

324

by dragging them or clicking on

. By default, SPSS will limit the output to the first 100 cases, but

lets set this to 10 so we just look at the first few cases (as in the book chapter).

Self-Test Answers

Using what you learnt in Chapter 5, use the compute command to reverse score item 3. (Clue: remember that you are simply changing the variable to 6 minus its original value.)

To access the compute dialog box, select

. We came across this command in

Chapter 5, and what we do is enter the name of the variable we want to change in the space labelled Target Variable (in this case the variable is called Question_03). You can use a different name if you like, but if you do SPSS will create a new variable and you must remember that its this new variable that you need to use in the reliability analysis. Then, where it says Numeric Expression you need to tell SPSS how to compute the new variable. In this case, we want to take each persons original score on item 3, and subtract that value from 6. Therefore, we simply type 6Question_03

325

(which means 6 minus the value found in the column labelled Question_03). If youve used the same name then when you click on existing variable; just click on youll get a dialog box asking if you want to change the

if youre happy for the new values to replace the old ones.

326

Additional Material Oliver Twisted: Please Sir, can I have some more Matrix Algebra?

The matrix enthuses Oliver, that was a good film. I want to dress in black and glide through the air as though time has stood still. Maybe the matrix of factor scores is as cool as the film. I think you might be disappointed Oliver, but well give it a shot. The matrix calculations of factor scores are detailed in the additional material for this chapter on the companion website. Be afraid, be very afraid

Calculation of Factor Score Coefficients


B = R 1 A 3.91 2.35 2.42 0.49 0.87 0.01 4.76 7.46 7.46 18.49 12.42 5.45 5.54 1.22 0.96 0.03 3.91 12.42 10.07 3.65 3.79 0.96 0.92 0.04 B= 2.35 5.45 3.65 2.97 2.16 0.02 0.00 0.82 2.42 5.54 3.79 2.16 2.98 0.56 0.10 0.75 0.49 1.22 0.96 0.02 0.56 1.27 0.09 0.70

Column 1 of matrix B

327

To get the first element of the first column of matrix B, you need to multiply each element in the first column of matrix A with the correspondingly placed element in the first row of matrix R1. Add these six products together to get the final value of the first element. To get the second element of the first column of matrix B, you need to multiply each element in the first column of matrix A with the correspondingly placed element in the second row of matrix R1. Add these six products together to get the final value and so on:
B11 = (4.75924 0.87407 ) + (7.46190 0.95768) + (3.90949 0.92138)

+ ( 2.35093 0.00237 ) + (2.42104 0.09575) + ( 0.48607 0.096 ) = 0.343

B12 = ( 7.4619 0.87407 ) + (18.48556 0.95768) + ( 12.41679 0.92138) + (5.445 0.00237 ) + ( 5.54427 0.09575) + (1.22155 0.096 ) = 0.376 B13 = (3.90949 0.87407 ) + ( 12.41679 0.95768) + (10.07382 0.92138)

+ ( 3.64853 0.00237 ) + (3.78869 0.09575) + ( 0.95731 0.096 ) = 0.362

B14 = ( 2.35093 0.87407 ) + (5.445 0.95768) + ( 3.64853 0.92138) = 0.000

+ (2.96922 0.00237 ) + ( 2.16094 0.09575) + (0.02255 0.096 )

B15 = (2.42104 0.87407 ) + ( 5.54427 0.95768) + (3.78869 0.92138) + ( 2.16094 0.00237 ) + (2.97983 0.09575) + ( 0.56017 0.096 ) = 0.037 B16 = ( 0.48607 0.87407 ) + (1.22155 0.95768) + ( 0.95731 0.92138) = 0.039

+ (0.02255 0.00237 ) + ( 0.56017 0.09575) + (1.27072 0.096 )

Column 2 of matrix B To get the first element of the second column of matrix B, you need to multiply each element in the second column of matrix A with the correspondingly placed element in the first row of matrix R1. Add these six products together to get the final value. To get the second element of the second column of matrix B, you need to multiply each element in the second column of matrix A with the

328

correspondingly placed element in the second row of matrix R1. Add these six products together to get the final value and so on:
B11 = (4.75924 0.00842 ) + (7.46190 0.03653) + (3.90949 0.03178)

+ ( 2.35093 0.81556 ) + (2.42104 0.75435) + ( 0.48607 0.69936 ) = 0.006

B12 = ( 7.4619 0.00842 ) + (18.48556 0.03653) + ( 12.41679 0.03178) + (5.445 0.81556 ) + ( 5.54427 0.75435) + (1.22155 0.69936 ) = 0.020 B13 = (3.90949 0.00842 ) + ( 12.41679 0.03653) + (10.07382 0.03178) + ( 3.64853 0.81556 ) + (3.78869 0.75435) + ( 0.95731 0.69936 ) = 0.020

B14 = ( 2.35093 0.00842 ) + (5.445 0.03653) + ( 3.64853 0.03178) = 0.473

+ (2.96922 0.81556 ) + ( 2.16094 0.75435) + (0.02255 0.69936 )

B15 = (2.42104 0.00842 ) + ( 5.54427 0.03653) + (3.78869 0.03178) + ( 2.16094 0.81556 ) + (2.97983 0.75435) + ( 0.56017 0.69936 ) = 0.437 B16 = ( 0.48607 0.00842 ) + (1.22155 0.03653) + ( 0.95731 0.03178) + (0.02255 0.81556 ) + ( 0.56017 0.75435) + (1.27072 0.69936 ) = 0.405

Oliver Twisted: Please Sir, can I have some more Questionnaires?

Im going to design a questionnaire to measure ones propensity to pick a pocket or two, says Oliver, but how would I go about doing it? Youd read the useful information about the dos and donts of questionnaire design in the additional material for this chapter on the companion website, thats how. Rate how useful it is on a Likert scale from 1 = not useful at all, to 5 = very useful.

329

What Makes a Good Questionnaire?

As a rule of thumb, never to attempt to design a questionnaire! A questionnaire is very easy to design, but a good questionnaire is virtually impossible to design. The point is that it takes a long time to construct a questionnaire with no guarantees that the end result will be of any use to anyone. A good questionnaire must have three things: Discrimination Validity Reliability

Discrimination Before talking about validity and reliability, we should talk about discrimination, which is really an issue of item selection. Discrimination simply means that people with different scores on a questionnaire should differ in the construct of interest to you. For example, a questionnaire measuring social phobia should discriminate between people with social phobia and people without it (i.e. people in the different groups should score differently). There are three corollaries to consider: 1. People with the same score should be equal to each other along the measured construct. 2. People with different scores should be different to each other along the measured construct. 3. The degree of difference between people is proportional to the difference in scores.

330

This is all pretty self-evident really so whats the fuss about? Well, lets take a really simple example of a three-item questionnaire measuring sociability. Imagine we administered this questionnaire to two people: Jane and Katie. Their responses are shown in Figure 1.
Jane Katie

Yes No

Yes No

1. I like going to parties 2. I often go to the pub 3 I really enjoy meeting people

5 5

1. I like going to parties 2. I often go to the pub 3. I really enjoy meeting people

5 5

Figure 1

Jane responded yes to items 1 and 3 but no to item 2. If we score a yes with the value 1 and a no with a 0, then we can calculate a total score of 2. Katie on the other hand answers yes to items 1 and 2 but no to item 3. Using the same scoring system her score is also 2. Therefore, numerically you have identical answers (i.e. both Jane and Katie score 2 on this questionnaire); therefore, these two people should be comparable in their sociability are they? The answer is: not necessarily. It seems that Katie likes to go to parties and the pub but doesnt enjoy meeting people in general, whereas Jane enjoys parties and meeting people but doesnt enjoy the pub. It seems that Katie likes social situations involving alcohol (e.g. the pub and parties) but Jane likes socializing in general, but cant tolerate cigarette smoke. In many ways, therefore, these people are very different because our questions are contaminated by other factors (i.e. attitudes to alcohol or smoky environments). A good questionnaire should be designed such that people with

331

identical numerical scores are identical in the construct being measured and thats not as easy to achieve as you might think! A second related point is score differences. Imagine you take scores on the Spider Phobia Questionnaire. Imagine you have three participants who do the questionnaire and get the following scores: Andy: 30 Difference = 15 Graham: 15 Dan: 10 Andy scores 30 on the SPQ (very spider phobic), Graham scores 15 (moderately phobic) and Dan scores 10 (not very phobic at all). Does this mean that Dan and Graham are more similar in their spider phobia than Graham and Andy? In theory this should be the case because Grahams score is more similar to Dans (difference = 5) than it is to Andys (difference = 15). In addition, is it the case that Andy is three times more phobic of spiders than Dan is? Is he twice as phobic as Graham? Again, his scores suggest that he should be. The point is that you cant guarantee in advance that differences in score are going to be comparable, yet a questionnaire needs to be constructed such that the difference in score is proportional to the difference between people. Validity Items on your questionnaire must measure something and a good questionnaire measures what you designed it to measure (this is called validity). Validity basically means measuring what you think youre measuring. So, an anxiety measure that actually measures assertiveness is not valid; Difference = 5

332

however, a materialism scale that does actually measure materialism is valid. Validity is a difficult thing to assess and it can take several forms: 1. Content validity: Items on a questionnaire must relate to the construct being measured. For example, a questionnaire measuring intrusive thoughts is pretty useless if it contains items relating to statistical ability. Content validity is really how representative your questions are the sampling adequacy of items. This is achieved when items are first selected: dont include items that are blatantly very similar to other items, and ensure that questions cover the full range of the construct. 2. Criterion validity: This is basically whether the questionnaire is measuring what it claims to measure. In an ideal world, you could assess this by relating scores on each item to real world observations (e.g. comparing scores on sociability items with the number of times a person actually goes out to socialize). This is often impractical and so there are other techniques such as (a) using the questionnaire in a variety of situations and seeing how predictive it is; (b) seeing how well it correlates with other known measures of your construct (i.e. sociable people might be expected to score highly on extroversion scales); and (c) using statistical techniques such as the Item Validity Index (IVI). 3. Factorial validity: This validity basically refers to whether the factor structure of the questionnaire makes intuitive sense. As such, factorial validity is assessed through factor analysis. When you have your final set of items you can conduct a factor analysis on the data (see the book). Factor analysis takes your correlated questions and recodes them into uncorrelated, underlying variables called factors (an example might be recoding the variables height, chest size, shoulder width and weight into an underlying variable called build). As another example, to assess success in a courze we might measure attentiveness in seminars, the amount of notes taken in seminars and the number of questions asked

333

during seminars all of these variables may relate to an underlying trait such as motivation to succeed. Factor analysis produces a table of items and their correlation, or loading, with each factor. A factor is composed of items that correlate highly with it. Factorial validity can be seen from whether the items tied onto factors make intuitive sense or not. Basically, if your items cluster into meaningful groups then you can infer factorial validity. Validity is a necessary but not sufficient condition of a questionnaire. Reliability A questionnaire must be not only valid, but also reliable. Reliability is basically the ability of the questionnaire to produce the same results under the same conditions. To be reliable the questionnaire must first be valid. Clearly the easiest way to assess reliability is to test the same group of people twice: if the questionnaire is reliable youd expect each persons scores to be the same at both points in time. So, scores on the questionnaire should correlate perfectly (or very nearly!). However, in reality, if we did test the same people twice then wed expect some practice effects and confounding effects (people might remember their responses from last time). Also this method is not very useful for questionnaires purporting to measure something that we would expect to change (such as depressed mood or anxiety). These problems can be overcome using the alternate form method in which two comparable questionnaires are devised and compared. Needless to say, this is a rather time-consuming way to ensure reliability and fortunately there are statistical methods to make life much easier. The simplest statistical technique is the split-half method. This method randomly splits the questionnaire items into two groups. A score for each subject is then calculated based on each half of the scale. If a scale is very reliable wed expect a persons score to be the same on one half of the scale as the other, and so the two halves should correlate perfectly. The correlation between the two halves is the statistic computed in the split-half method, large correlations being a sign of

334

reliability.4 The problem with this method is that there are a number of ways in which a set of data can be split into two and so the results might be a result of the way in which the data were split. To overcome this problem, Cronbach suggested splitting the data in two in every conceivable way and computing the correlation coefficient for each split. The average of these values is known as Cronbachs alpha, which is the most common measure of scale reliability. As a rough guide, a value of 0.8 is seen as an acceptable value for Cronbachs alpha; values substantially lower indicate an unreliable scale (see the book for more detail).
How to Design your Questionnaire

Step 1: Choose a Construct First you need to decide on what you would like to measure. Once you have done this use PsychLit and the Web of Knowledge to do a basic search for some information on this topic. I dont expect you to search through reams of material, but just get some basic background on the construct youre testing and how it might relate to psychologically important things. For example, if you looked at Empathy, this is seen as an important component of Carl Rogers client-centred therapy; therefore, having the personality trait of empathy might be useful if you were to become a Rogerian therapist. It follows then that having a questionnaire to measure this trait might be useful for selection purposes on Rogerian therapy training courses. So, basically you need to set some kind of context to why the construct is important this information will form the basis of your introduction. Step 2: Decide on a Response Scale A fundamental issue is how you want respondents to answer questions. You could choose to have:

In fact the correlation coefficient is adjusted to account for the smaller sample on which scores from the scale are based (remember that these scores are based on half of the items on the scale).

335

Yes/No or Yes/No/Dont Know scales: This forces people to give one answer or another even though they might feel that they are neither a yes nor no. Also, imagine you were measuring intrusive thoughts and you had an item I think about killing children. Chances are everyone would respond no to that statement (even if they did have those thoughts) because it is a very undesirable thing to admit. Therefore, all this item is doing is subtracting a value to everybodys score it tells you nothing meaningful, it is just noise in the data. This scenario can also occur when you have a rating scale with a dont know response (because people just cannot make up their minds and opt for the neutral response). It is which is why it is sometimes nice to have questionnaires with a neutral point to help you identify which things people really have no feeling about. Without this midpoint you are simply making people go one way or the other which is comparable to balancing a coin on its edge and seeing which side up it lands when it falls. Basically, when forced 50% will choose one option while 50% will choose the opposite this is just noise in your data.

Likert scale: This is the standard agreedisagree ordinal categories response. It comes in many forms:
o 3-point: AgreeNeither Agree nor DisagreeDisagree o 5-point: AgreeMidpointNeither Agree nor DisagreeMidpointDisagree o 7-Point: Agree2 PointsNeither Agree nor Disagree2 PointsDisagree
Questions should encourage respondents to use all points of the scale. So, ideally the statistical distribution of responses to a single item should be normal with a mean that lies at the centre of the scale (so on a 5-point Likert scale the mean on a given question should be 3). The range of scores should also cover all possible responses.

Step 3: Generate Your Items

336

Once youve found a construct to measure and decided on the type of response scale youre going to use, the next task is to generate items. I want you to restrict your questionnaire to around 30 items (20 minimum). The best way to generate items is to brainstorm a small sample of people. This involves getting people to list as many facets of your construct as possible. For example, if you devised a questionnaire on exam anxiety, you might ask a number of students (20 or so) from a variety of courses (arts and science), years (first, second and final) and even institutions (friends at other universities) to list (on a piece of paper) as many things about exams as possible that make them anxious. It is good if you can include people within this sample that you think might be at the extremes of your construct (e.g. select a few people who get very anxious about exams and some who are very calm). This enables you to get items that span the entire spectrum of the construct that you want to measure. This will give you a pool of items to inspire questions. Rephrase your samples suggestions in a way that fits the rating scale youve chosen and then eliminate any questions that are basically the same. You should hopefully begin with a pool of say 5060 questions that you can reduce to about 30 by eliminating obviously similar questions. Things to Consider: 1. Wording of questions: The way in which questions are phrased can bias the answers that people give; For example, Gaskell, Wright, and OMuircheartaigh (1993) report several studies in which subtle changes in the wording of survey questions can radically affect peoples responses. Gaskell et al.s article is a very readable and useful summary of this work and their conclusions might be useful to you when thinking about how to phrase your questions.

337

2. Response bias: This is the tendency of respondents to give the same answer to every question. Try to reverse-phrase a few items to avoid response bias (and remember to score these items in reverse when you enter the data into SPSS). Step 4: Collect the Data Once youve written your questions, randomize their order and produce your questionnaire. This is the questionnaire that youre going test. Photocopy the questionnaire and administer it to as many people as possible (one benefit of making these questionnaires short is that it minimizes the time taken to complete them!). You should aim for 50100 respondents, but the more you get, the better your analysis (which is why I suggest working in slightly bigger groups to make data collection easier). Step 5: Analysis Enter the data into SPSS by having each question represented by a column in SPSS. Translate your response scale into numbers (i.e. a 5point Likert might be 1 = completely disagree, 2 = disagree, 3 = neither agree nor disagree, 4 = agree, 5 = completely agree). Reversephrased items should be scored in reverse too! What were trying to do with this analysis is to first eliminate any items on the questionnaire that arent useful. So, were trying to reduce our 30 items further before we run our factor analysis. We can do this by looking at descriptive statistics and also correlations between questions. Descriptive statistics: The first thing to look at is the statistical distribution of item scores. This alone will enable you to throw out many redundant items. Therefore, the first thing to do when piloting a questionnaire is look at descriptive statistics on the questionnaire items. This is easily done in SPSS (see the book chapter). Were on the lookout for: 1. Range: Any item that has a limited range (all the points of the scale have not been used).

338

2. Skew: I mentioned above that ideally each question should elicit a normally distributed set of responses across subjects (each items mean should be at the centre of the scale and there should be no skew). To check for items that produce skewed data, look for the skewness and SE skew in your SPSS output. We have also discovered in this book course that you can divide the skewness by its standard error (SE skew) to form a z-score (see Chapter 5). 3. Standard deviation: Related to the range and skew of the distribution, items with high or low standard deviations may cause problems, so be wary of high and low values for the SD. These are your first steps. Basically if any of these rules are violated then your items become non-comparable (in terms of the factor analysis) which makes the questionnaire pretty meaningless! Correlations: All of your items should inter correlate at a significant level if they are measuring aspects of the same thing. If any items do not correlate at a 5% or 1% level of significance then exclude them. You can get a table of intercorrelations from SPSS. The book gives more detail on screening correlation coefficients for items that correlate with few others or correlate too highly with other items (multicollinearity and singularity). Factor analysis: When youve eliminated any items that have distributional problems or do not correlate with each other, then run your factor analysis on the remaining items and try to interpret the resulting factor structure. The book chapter details the process of factor analysis. What you should do is examine the factor structure and decide: 1. Which factors to retain. 2. Which items load onto those factors. 3. What your factors represent.

339

4. If there are any items that dont load highly onto any factors, they should be eliminated from future versions of the questionnaire (for our purposes you need only state that they are not useful items as you wont have time to revise and retest your questionnaires!). Step 6: Assess the Questionnaire Having looked at the factor structure, you need to check the reliability of your items and the questionnaire as a whole. We should run a reliability analysis on the questionnaire. This is explained in Chapter 17 of the book. There are two things to look at: (1) the Item Reliability Index (IRI), which is the correlation between the score on the item and the score on the test as a whole multiplied by the standard deviation of that item (called the corrected item-total correlation in SPSS). SPSS will do this corrected item-total correlation and wed hope that these values would be significant for all items. Although we dont get significance values as such, we can look for correlations greater than about 0.3 (although the exact value depends on the sample size, this is a good cut-off for the size of sample youll probably have). Any items having correlations less than 0.3 should be excluded from the questionnaire. (2) Cronbachs alpha, as weve seen, should be 0.8 or more and the deletion of an item should not affect this value too much (see the reliability analysis handout for more detail).
The End?

You should conclude by describing your factor structure and the reliability of the scale. Also say whether there are items that you would drop in a future questionnaire. In an ideal world wed then generate new items to add to the retained items and start the whole process again!

340

Labcoat Lenis Real Research: World Wide Addiction?

Nichols, L.A., & Nicki, R.(2004). Psychology of Addictive Behaviors, 18(4), 381384.

The Internet is now a houshold tool. In 2007 it was estimated that around 179 million people worldwide used the Internet (over 100 million of those were in the USA and Canada). From the increasing populatrity (and usefulness) of the Internet has emerged a new phenomenon: Internet addiction. This is now a serious and recognized problem, but until very recently it was very difficult to research this topic because there was not a psychometrically sound measure of Internet addition. That is, until Laura Nichols and Richard Nicki developed the Internet Addiction Scale, IAS (Nichols & Nicki, 2004). (Incidentally, while doing some research on this topic I encountered an Internet addiction recovery website that I wont name but offered a whole host of resources that would keep you online for ages, such as questionnaires, an online support group, videos, articles, a recovery blog and podcasts. It struck me that that this was a bit like having a recovery centre for heroin addiction where the addict arrives to be greeted by a nicelooking counsellor who says theres a huge pile of heroin in the corner over there, just help yourself.) Anyway, Nichols and Nicki developed a 36item questionnaire to measure Internet addiction. It contained items such as I have stayed on the Internet longer than I intended to and My grades/work have suffered because of my Internet use which could be responded to on a 5-point scale (Never, Rarely, Sometimes, Frequently, Always). They collected data from 207 people to validate this measure. The data from this study are in the file Nichols & Nicki (2004).sav. The authors dropped two

341

items because they had low means and variances, and dropped three others because of relatively low correlations with other items. They performed a principal components analysis on the remaining 31 items. Labcoat Leni wants you to run some descriptive statistics to work out which two items were dropped for having low means/variances, and then inspect a correlation matrix to find the three items that were dropped for having low correlations. Finally, he wants you to run a principal component analysis on the data.

To get the descriptive statistics I would use of the questionnaire items but just ask for means and standard deviations at this stage:

. Select all

342

The table of means and standard deviations shows that the items with the lowest values are IAS-23 (I see my friends less often because of the time that I spend on the Internet) and IAS-34 (When I use the Internet, I experience a buzz or a high).

To get a table of correlations select and leave the default options as they are:

. Select all of the variables

343

344

IAS01 IAS02 IAS03 IAS04 IAS05 IAS06 IAS07 IAS08 IAS09 IAS10 IAS11 IAS12 IAS13 IAS14 IAS15 IAS16 IAS17 IAS18 IAS19 IAS20 IAS21 IAS22 IAS23 IAS24 IAS25 IAS26 IAS27 IAS28 IAS29 IAS30 IAS31 IAS32 IAS33 IAS34 IAS35 IAS36 IAS01 IAS02 0.43 0.46 0.35 0.52 0.56 0.48 0.48 0.51 0.43 0.42 0.43 0.12 0.49 0.51 0.52 0.35 0.47 0.46 0.48 0.47 0.16 0.28 0.42 0.45 0.52 0.40 0.49 0.54 0.47 0.33 0.22 0.50 0.44 0.38 0.49 0.43 0.33 0.54 0.38 0.24 0.39 0.32 0.29 0.30 0.26 0.32 0.37 0.38 0.35 0.30 0.25 0.28 0.28 0.29 0.29 0.15 0.19 0.31 0.26 0.28 0.29 0.20 0.32 0.30 0.24 0.15 0.36 0.20 0.27 0.32 0.30 0.52 0.47 0.41 0.49 0.62 0.50 0.40 0.43 0.46 0.19 0.40 0.42 0.40 0.39 0.36 0.65 0.44 0.45 0.18 0.26 0.60 0.35 0.44 0.39 0.37 0.40 0.42 0.51 0.26 0.45 0.29 0.43 0.46 0.42 0.46 0.27 0.45 0.44 0.37 0.37 0.27 0.44 0.31 0.36 0.27 0.31 0.29 0.34 0.42 0.42 0.36 0.15 0.25 0.41 0.37 0.27 0.26 0.22 0.43 0.39 0.28 0.17 0.47 0.22 0.25 0.35 0.34 0.48 0.43 0.59 0.51 0.52 0.34 0.44 0.24 0.40 0.37 0.36 0.40 0.47 0.51 0.47 0.52 0.15 0.34 0.49 0.47 0.43 0.35 0.39 0.55 0.47 0.33 0.25 0.60 0.42 0.42 0.47 0.43 0.50 0.43 0.50 0.59 0.42 0.50 0.10 0.44 0.39 0.39 0.50 0.39 0.49 0.45 0.49 0.16 0.29 0.38 0.36 0.65 0.33 0.57 0.46 0.42 0.28 0.36 0.35 0.47 0.26 0.51 0.41 0.47 0.54 0.60 0.41 0.60 0.22 0.37 0.36 0.27 0.50 0.55 0.44 0.53 0.55 0.15 0.21 0.49 0.39 0.44 0.37 0.42 0.48 0.41 0.33 0.14 0.50 0.43 0.25 0.66 0.42 0.63 0.48 0.43 0.54 0.24 0.42 0.42 0.42 0.43 0.46 0.63 0.53 0.57 0.29 0.30 0.54 0.51 0.47 0.40 0.49 0.45 0.43 0.39 0.23 0.49 0.41 0.47 0.56 0.46 0.56 0.44 0.49 0.21 0.45 0.45 0.34 0.50 0.54 0.48 0.52 0.63 0.10 0.32 0.51 0.41 0.46 0.34 0.50 0.51 0.41 0.30 0.13 0.53 0.37 0.25 0.55 0.43 0.51 0.64 0.21 0.49 0.44 0.25 0.57 0.58 0.49 0.56 0.58 0.22 0.30 0.45 0.40 0.51 0.28 0.48 0.58 0.38 0.30 0.27 0.47 0.52 0.27 0.64 0.45 0.51 0.23 0.42 0.55 0.40 0.46 0.46 0.49 0.46 0.54 0.33 0.31 0.53 0.42 0.57 0.36 0.58 0.45 0.36 0.30 0.19 0.44 0.34 0.31 0.49 0.41 0.26 0.43 0.38 0.27 0.60 0.50 0.54 0.55 0.61 0.22 0.25 0.49 0.43 0.53 0.27 0.47 0.53 0.43 0.34 0.22 0.46 0.37 0.25 0.56 0.44 0.19 0.11 0.10 0.12 0.16 0.16 0.19 0.27 0.31 0.20 0.33 0.20 0.14 0.18 0.19 0.19 0.21 0.17 0.26 0.23 0.10 0.16 0.20 0.20 0.47 0.34 0.41 0.47 0.43 0.57 0.43 0.30 0.35 0.46 0.35 0.53 0.40 0.51 0.47 0.45 0.31 0.30 0.43 0.42 0.45 0.54 0.42 0.41 0.37 0.52 0.51 0.52 0.43 0.20 0.44 0.40 0.40 0.65 0.41 0.71 0.47 0.39 0.28 0.22 0.47 0.48 0.43 0.57 0.42 0.25 0.34 0.43 0.38 0.26 0.27 0.26 0.32 0.35 0.49 0.36 0.44 0.44 0.34 0.22 0.27 0.35 0.37 0.35 0.39 0.34 0.51 0.43 0.54 0.52 0.18 0.26 0.43 0.36 0.54 0.17 0.43 0.48 0.42 0.29 0.26 0.36 0.39 0.26 0.58 0.39 0.45 0.64 0.55 0.15 0.42 0.45 0.52 0.51 0.24 0.51 0.57 0.43 0.18 0.13 0.55 0.54 0.35 0.65 0.44 0.52 0.53 0.19 0.32 0.61 0.48 0.60 0.38 0.53 0.46 0.52 0.42 0.24 0.44 0.39 0.46 0.57 0.46 0.57 0.26 0.41 0.51 0.58 0.62 0.34 0.61 0.71 0.46 0.35 0.30 0.62 0.52 0.45 0.69 0.49 0.27 0.28 0.56 0.41 0.55 0.26 0.54 0.54 0.50 0.36 0.26 0.54 0.41 0.24 0.59 0.45 0.39 0.21 0.28 0.28 0.18 0.32 0.18 0.28 0.16 0.33 0.15 0.20 0.26 0.27 0.23 0.35 0.40 0.47 0.20 0.52 0.44 0.28 0.18 0.24 0.29 0.32 0.48 0.47 0.32 0.49 0.47 0.44 0.46 0.49 0.55 0.55 0.21 0.46 0.27 0.42 0.49 0.44 0.48 0.39 0.52 0.53 0.43 0.27 0.21 0.53 0.41 0.48 0.51 0.41 0.41 0.76 0.56 0.47 0.25 0.28 0.49 0.63 0.52 0.64 0.48 0.46 0.32 0.25 0.39 0.15 0.37 0.27 0.41 0.35 0.32 0.56 0.39 0.22 0.30 0.41 0.56 0.45 0.65 0.46 0.45 0.28 0.26 0.68 0.59 0.38 0.64 0.47 0.43 0.33 0.43 0.30 0.31 0.49 0.40 0.20 0.33 0.11 0.33 0.35 0.30 0.26 0.25 0.26 0.19 0.24 0.47 0.37 0.52 0.44 0.49 0.58 0.39 0.43 0.36 0.50 0.43 0.46 0.33 0.35 0.54 0.52 0.52 0.38 0.47 0.46 0.56 0.24 0.41 0.27 0.48 0.48 0.39 0.49 0.45 0.43 0.50 0.48 0.32 0.62 0.44 0.59 0.43 0.47 0.51 0.29 0.50 0.37 0.51 0.50 0.54 0.63 0.43 0.30 0.40 0.37 0.52 0.59 0.60 0.48 0.56 0.42 0.26 0.43 0.27 0.34 0.42 0.41 0.43 0.44 0.51 0.43 0.32 0.46 0.44 0.44 0.50 0.60 0.54 0.49 0.64 0.51 0.12 0.37 0.19 0.31 0.24 0.10 0.22 0.24 0.21 0.21 0.23 0.26 0.49 0.38 0.40 0.36 0.40 0.44 0.37 0.42 0.45 0.49 0.42 0.43 0.19 0.51 0.35 0.42 0.27 0.37 0.39 0.36 0.42 0.45 0.44 0.55 0.38 0.11 0.47 0.52 0.30 0.40 0.31 0.36 0.39 0.27 0.42 0.34 0.25 0.40 0.27 0.10 0.34 0.41 0.35 0.25 0.39 0.29 0.40 0.50 0.50 0.43 0.50 0.57 0.46 0.60 0.12 0.41 0.37 0.25 0.47 0.28 0.36 0.34 0.47 0.39 0.55 0.46 0.54 0.58 0.46 0.50 0.16 0.47 0.52 0.34 0.51 0.46 0.28 0.65 0.42 0.51 0.49 0.44 0.63 0.48 0.49 0.49 0.54 0.16 0.43 0.51 0.43 0.43 0.45 0.48 0.29 0.44 0.42 0.47 0.45 0.53 0.53 0.52 0.56 0.46 0.55 0.19 0.57 0.52 0.38 0.54 0.64 0.52 0.47 0.29 0.45 0.36 0.52 0.49 0.55 0.57 0.63 0.58 0.54 0.61 0.27 0.43 0.43 0.26 0.52 0.55 0.53 0.57 0.16 0.15 0.18 0.15 0.15 0.16 0.15 0.29 0.10 0.22 0.33 0.22 0.31 0.30 0.20 0.27 0.18 0.15 0.19 0.26 0.27 0.28 0.19 0.26 0.25 0.34 0.29 0.21 0.30 0.32 0.30 0.31 0.25 0.20 0.35 0.44 0.26 0.26 0.42 0.32 0.41 0.28 0.39 0.42 0.31 0.60 0.41 0.49 0.38 0.49 0.54 0.51 0.45 0.53 0.49 0.33 0.46 0.40 0.32 0.43 0.45 0.61 0.51 0.56 0.21 0.35 0.45 0.26 0.35 0.37 0.47 0.36 0.39 0.51 0.41 0.40 0.42 0.43 0.20 0.35 0.40 0.35 0.36 0.52 0.48 0.58 0.41 0.28 0.40 0.49 0.52 0.28 0.44 0.27 0.43 0.65 0.44 0.47 0.46 0.51 0.57 0.53 0.14 0.53 0.65 0.49 0.54 0.51 0.60 0.62 0.55 0.28 0.47 0.47 0.48 0.40 0.29 0.39 0.26 0.35 0.33 0.37 0.40 0.34 0.28 0.36 0.27 0.18 0.40 0.41 0.36 0.17 0.24 0.38 0.34 0.26 0.18 0.20 0.44 0.39 0.41 0.49 0.20 0.37 0.22 0.39 0.57 0.42 0.49 0.50 0.48 0.58 0.47 0.19 0.51 0.71 0.44 0.43 0.51 0.53 0.61 0.54 0.32 0.52 0.46 0.52 0.76 0.46 0.54 0.32 0.40 0.43 0.55 0.46 0.48 0.45 0.51 0.58 0.45 0.53 0.19 0.47 0.47 0.44 0.48 0.57 0.46 0.71 0.54 0.18 0.44 0.49 0.53 0.56 0.32 0.56 0.47 0.30 0.42 0.39 0.47 0.42 0.41 0.43 0.41 0.38 0.36 0.43 0.21 0.45 0.39 0.34 0.42 0.43 0.52 0.46 0.50 0.28 0.28 0.55 0.43 0.47 0.25 0.39 0.45 0.33 0.24 0.51 0.28 0.33 0.28 0.33 0.39 0.30 0.30 0.30 0.34 0.17 0.31 0.28 0.22 0.29 0.18 0.42 0.35 0.36 0.16 0.18 0.55 0.27 0.25 0.39 0.22 0.28 0.43 0.22 0.15 0.26 0.17 0.25 0.36 0.14 0.23 0.13 0.27 0.19 0.22 0.26 0.30 0.22 0.27 0.26 0.13 0.24 0.30 0.26 0.33 0.24 0.21 0.21 0.28 0.15 0.30 0.26 0.33 0.20 0.50 0.36 0.45 0.47 0.60 0.35 0.50 0.49 0.53 0.47 0.44 0.46 0.23 0.43 0.47 0.35 0.36 0.55 0.44 0.62 0.54 0.15 0.29 0.46 0.53 0.49 0.37 0.41 0.68 0.43 0.33 0.26 0.44 0.20 0.29 0.22 0.42 0.47 0.43 0.41 0.37 0.52 0.34 0.37 0.10 0.42 0.48 0.37 0.39 0.54 0.39 0.52 0.41 0.20 0.32 0.27 0.41 0.63 0.27 0.56 0.59 0.30 0.11 0.25 0.47 0.38 0.27 0.43 0.25 0.42 0.26 0.25 0.47 0.25 0.27 0.31 0.25 0.16 0.45 0.43 0.35 0.26 0.35 0.46 0.45 0.24 0.26 0.48 0.42 0.48 0.52 0.41 0.45 0.38 0.31 0.33 0.26 0.37 0.49 0.49 0.32 0.46 0.35 0.47 0.51 0.66 0.56 0.55 0.64 0.49 0.56 0.20 0.54 0.57 0.39 0.58 0.65 0.57 0.69 0.59 0.27 0.47 0.49 0.51 0.64 0.35 0.65 0.64 0.49 0.35 0.19 0.52 0.58 0.43

IAS03 IAS04 IAS05 IAS06 IAS07 IAS08 IAS09 IAS10 IAS11 IAS12 IAS13 IAS14 IAS15 IAS16 IAS17 IAS18 IAS19 IAS20 IAS21 IAS22 IAS23 IAS24 IAS25 IAS26 IAS27 IAS28 IAS29 IAS30 IAS31 IAS32 IAS33 IAS34 IAS35 IAS36 Mean

345

We know that the authors eliminated three items for having low correlations. My table of correlations has the average correlation. The lowest average correlations are for items IAS-13 (I have felt a persistent desire to cut down or control my use of the Internet), IAS-22 (I have neglected things which are important and need doing) and IAS-32 (I find myself thinking/longing about when I will go on the Internet again). As such these variables will also be excluded from the factor analysis. To do the principal component analysis select Choose all of the variables except for the five that we have excluded: .

346

347

The output should look like this:

Sample size: McCallum et al. (1999) have demonstrated that when communalities after extraction are above .5, a sample size between 100 and 200 can be adequate and even when communalities are below .5, a sample size of 500 should be sufficient. We have a sample size of 207 with only one communality below .5, and so the sample size should be adequate. However, the KMO measure of sampling adequacy is .942, which is above Kaisers (1974) recommendation of .5. This value is also marvellous according to Hutcheson & Sofroniou (1999). As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors.

348

Bartletts test: This tests whether the correlations between questions are sufficiently large for factor analysis to be appropriate (it actually tests whether the correlation matrix is sufficiently different from an identity matrix). In this case it is significant (2(465) = 4238.98, p < .001) indicating that the correlations within the R-matrix are sufficiently different from zero to warrant factor analysis.

349

350

Extraction: SPSS has extracted five factors based on Kaisers criterion of retaining factors with eigenvalues greater than 1. Is this warranted? Kaisers criterion is accurate when there are less than 30 variables and the communalities after extraction are greater than .7, or when the sample size exceeds 250 and the average communality is greater than .6. For these data the sample size is 207, there are 31 variables and the mean communality is .64, so extracting five factors is probably not warranted. The scree plot shows a clear onefactor solution. This is the solution that the authors adopted.

351

Because we are retaining only one factor we can ignore the rotated factor solution and just look at the unrotated component matrix. This shows that all items have a high loading on factor 1.

The authors reported their analysis as follows: We conducted principal-components analyses on the log transformed scores of the IAS (see above). On the basis of the scree test (Cattell, 1978) and the percentage of variance accounted for by each factor, we judged a one-factor solution to be most appropriate. This component accounted for a total of 46.50% of the variance. A value for loadings of .30 (Floyd & Widaman, 1995) was used as a cut-off for items that did not relate to a component.

352

All 31 items loaded on this component, which was interpreted to represent aspects of a general factor relating to Internet addiction reflecting the negative consequences of excessive Internet use. (P. 382)
Chapter 18

Self-Test Answers

Run a multiple regression analysis using CatsRegression.sav with


LnObserved as the outcome, and Training, Dance and Interaction as your three predictors.

The multiple regression dialog box will look like the following diagram. We can leave all of the default options as they are because we are interested only in the regression parameters.

353

The regression parameters are shown in the book. To show that this all actually works, run another multiple regression analysis using CatsRegression.sav; this time the outcome is the log of expected frequencies (LnExpected) and
Training and Dance are the predictors (the interaction is not

included).

The multiple regression dialog box will look like the following diagram. We can leave all of the default options as they are because we are interested only in the regression parameters.

354

The resulting regression parameters are:

Note that b0 = 2.67, the beta coefficient for the type of training is 1.45 and the beta coefficient for whether they danced is 0.49. All of these values are consistent with those calculated in the book Chapter.

355

Create a contingency table of these data with dance as the columns, the type of training as rows and the type of animal as a layer.

To use the crosstabs command select

. We have

three variables in our crosstabulation table: whether the animal danced or not (Dance), the type of reward given (Training), and whether the animal was a cat or dog (Animal). Select Training and drag it into the box labelled Row(s) (or click on
Dance and drag it to the box labelled Column(s) (or click on

). Next, select

). We have a third variable

too, and we need to define this variable as a layer. Select Animal and drag it to the box labelled Layer 1 of 1 (or click on ). Then click on and select the options below.

356

Can you use the Chart Builder to replicate the graph in Figure 18.7? Actually this selftest is not as easy as it looks. The diagrams below guide you through the process.

Click here to select a clustered bar chart

Drag Animal here to create separate panels for dogs and cats

Drag Dance here. Bars will be coloured by whether animals danced or not

357

Click here to create panels in the graph

Drag Training here. Data will be clustered by the type of training used

Select to make the panels appear in columns

358

We want to display percentages rather than counts because there were more cats than dogs and this will allow us to compare animals directly. To do this, click here and select Percentage() from the list

Dont forget to click here to apply the changes to the graph

By default SPSS will display the percentage of the total sample. However, we want the percentage to be calculated within each animal (i.e. the percentage of cats that danced for food). To display these percentages, select Total for Panel from the drop-down list. This will calculate the percentage within each panel (not all panels combined). This means that we will get the percentage of cats and dogs, not the percentage of all animals

Use the split file command to run a chi-square test on Dance and
Training for dogs and cats.

First, to split the file we need to select

and then select the Organize output by

groups option. Once this option is selected, the Groups Based on box will activate. Select

359

the variable containing the group codes by which you wish to repeat the analysis (in this example select Animal), and drag it to the box or click on .

To run the chi-square tests, select

. First, select

one of the variables of interest in the variable list and drag it into the box labelled Row(s) (or click on ). For this example, I selected Training to be the rows of the table. Next,

select the other variable of interest (Dance) and drag it to the box labelled Column(s) (or click on ). Select the same options as in the book (for the cat example).

360

Additional Material

Labcoat Lenis Real Research: Is the black American happy?

Beckham, A. S. (1929). Journal of Abnormal and Social Psychology, 24, 186190.

When I was doing my psychology degree I spent a lot of time reading about the civil rights movement in the USA. Although I was supposed to be reading psychology, I became more interested in Malcolm X and Martin Luther King Jr. This is why I find Beckhams 1929 study of black Americans such an interesting piece of research. Beckham was a black American academic who founded the Psychology Laboratory at

361

Howard University, Washington, D.C, and his wife Ruth was the first black woman ever to be awarded a Ph.D. (also in psychology) at the University of Minnesota. The article needs to be placed within the era in which it was published. To put some context on the study, it was published 36 years before the Jim Crow laws were finally overthrown by the Civil Rights Act of 1964, and in a time when black Americans were segregated, openly discriminated against and were victims of the most abominable violations of civil liberties and human rights. For a richer context I suggest reading James Baldwins superb novel The fire next time. Even the language of the study and the data from it are an uncomfortable reminder of the era in which it was conducted. Beckham sought to measure the psychological state of black Americans with three questions asked to 3443 black Americans from different walks of life. He asked them whether they thought black Americans were happy, whether they personally were happy as a black American, and whether black Americans should be happy. They could answer only yes or no to each question. By todays standards the study is quite simple, and he did no formal statistical analysis on his data (Fishers article containing the popularized version of the chi-square test was published only seven years earlier in a statistics journal that would not have been read by psychologists). I love this study, though, because it demonstrates that you do not need elaborate methods to answer important and farreaching questions; with just three questions, Beckham told the world an enormous amount about very real and important psychological and sociological phenomena. The frequency data (number of yes and no responses within each employment category) from this study are in the file Beckham(1929).sav. Labcoat Leni wants you to carry out three chi-square tests (one for each question that was asked). What conclusions can you

362

draw?

Are black Americans Happy? Lets run the analysis on the first question. First we must remember to tell SPSS which variable contains the frequencies by using the weight cases command. Select , then in the resulting dialog box select and then select the

variable in which the number of cases is specified (in this case Happy) and drag it to the box labelled Frequency variable (or click on ). This process tells the computer that it

should weight each category combination by the number in the column labelled happy.

To conduct the chi-square test, use the crosstabs command by selecting . We have two variables in our crosstabulation table: the occupation of the participant (Profession) and whether they responsed yes or no to the question (Response). Select one of these variables and drag it into the box

363

labelled Row(s) (or click on

). For this example, I selected Profession to be the rows of

the table. Next, select the other variable of interest (Response) and drag it to the box labelled Column(s) (or click on ). Use the book chapter to select other appropriate

options (we do not need to use the exact test used in the chapter because our sample size is very large; however, you could choose a Monte Carlo test of significance if you like).

364

365

The chi-square test is highly significant, 2(7) = 936.14, p < .001. This indicates that the profile of yes and no responses differed across the professions. Looking at the standardized residuals, the only profession for which these are non-significant are housewives who showed a fairly even split of whether they thought black Americans were happy (40%) or not (60%). Within the other professions all of the standardized residuals are much higher than 1.96, so how can we make sense of the data? Whats interesting is to look at the direction of these residuals (i.e. whether they are positive or negative). For the following professions the residual for no was positive but for yes was negative; these are therefore people who responded more than we would expect that black Americans were not happy and less than expected that black Americans were happy: college students, preachers and lawyers. The remaining professions (labourers, physicians, school teachers and musicians) show the opposite pattern: the residual for no was negative but for yes was positive; these are therefore people who responded less than we would expect that black Americans were not happy and more than expected that black Americans were happy.

Are they Happy as black Americans? We run this analysis in exactly the same way except that we now have to weight the case by the variable You_Happy. Select ; then in the resulting dialog box

should already be selected from the previous analysis. Select the variable in the box labelled Frequency variable and click on to move it back to the variable list

and clear the box. Then, we need to select the variable in which the number of cases is specified (in this case You_Happy) and drag it to the box labelled Frequency variable
366

(or click on

). This process tells the computer that it should weight each category

combination by the number in the column labelled You_Happy.

Then carry out the analysis through crosstabs exactly as before.

367

368

The chi-square test is highly significant, 2 (7) = 1390.74, p < .001. This indicates that the profile of yes and no responses differed across the professions. Looking at the standardized residuals, these are significant in most cells with a few exceptions: physicians, lawyers and school teachers saying yes. Within the other cells all of the standardized residuals are much higher than 1.96. Again, we can look at the direction of these residuals (i.e. whether they are positive or negative). For labourers, housewives, school teachers and musicians the residual for no was positive but for yes was negative; these are therefore people who responded more than we would expect that they were not happy as black Americans and less than expected that they were happy as black Americans. The remaining professions (college students, physicians, preachers and lawyers) show the opposite pattern: the residual for no was negative but for yes was positive; these are therefore people who responded less than we would expect that they were not happy as black Americans and more than expected that they were happy as black Americans. Essentially, the former group are in lowpaid jobs in which conditions would have been very hard (especially in the social context of the time). The latter group are in much more respected (and probably better-paid) professions. Therefore, the

369

responses to this question could say more about the professions of the people asked than their views of being black Americans.

Should black Americans be happy? We run this analysis in exactly the same way except that we now have to weight the case by the variable Should_Be_Happy. Select ; then in the resulting dialog box

should already be selected from the previous analysis. Select the variable in the box labelled Frequency variable and click on to move it back to the variable list

and clear the box. Then, we need to select the variable in which the number of cases is specified (in this case Should_Be_Happy) and drag it to the box labelled Frequency variable (or click on ). This process tells the computer that it should weight each

category combination by the number in the column labelled Should_Be_Happy. Then carry out the analysis through crosstabs exactly as before.

370

371

The chi-square test is highly significant, 2 (7) = 1784.23, p < .001. This indicates that the profile of yes and no responses differed across the professions. Looking at the standardized residuals, these are nearly all significant. Again, we can look at the direction of these residuals (i.e. whether they are positive or negative). For college students and lawyers the residual for no was positive but for yes was negative; these are therefore people who responded more than we would expect that they thought that black Americans should not be happy and less than expected that they thought black Americans should be happy. The remaining professions show the opposite pattern: residual for no was negative but for yes was positive; these are therefore people who responded less than we would expect that they did not think that black Americans should be happy and more than expected that they thought that black Americans should be happy. What is interesting here and in question 1 is that college students and lawyers are in vocations in which they are expected to be critical about the world. Lawyers may well have defended black Americans who had been the subject of injustice and discrimination or racial abuse, and college students would likely be applying their critically trained minds to the immense social injustice that prevailed at the time. Therefore, these groups can see that their racial group should not be happy and should strive for the equitable and just society to which they are entitled. People in the other professions perhaps adopt a different social comparison. Its also possible for this final question that the groups interpreted the question differently: perhaps the lawyers and students interpreted the question as should they be

372

happy given the political and social conditions of the time? whereas the others interpreted the question as do they deserve happiness?. It might seem strange to have picked a piece of research from so long ago to illustrate the chi-square test, but what I wanted to demonstrate is that simple research can sometimes be incredibly illuminating. This study asked three simple questions, yet the data are utterly fascinating. It raises further hypotheses that could be tested, it unearths very different views in different professions, and it illuminates a very important social and psychological issue. There are other studies that sometimes use the most elegant paradigms and the highly complex methodologies, but the questions they address are utterly meaningless for the real world. They miss the big picture. Albert Beckham was a remarkable man, trying to understand important and big realworld issues that mattered to hundreds of thousands of people.
Chapter 19

Self-Test Answers

Using what you know about ANOVA, conduct a one-way ANOVA using Surgery as the predictor and Post_QoL as the outcome.

Select

and complete the dialog box as follows:

373

Using what you know about ANCOVA, conduct a one-way ANCOVA using Surgery as the predictor, Post_QoL as the outcome and Base_QoL as the covariate.

Select

and complete the dialog box as follows:

374

Split the file by Reason and then run a multilevel model predicting
Post_QoL with a random intercept, and random slopes for Surgery, and including Base_QoL and Surgery as predictors.

First, split the file by Reason by selecting look like this:

. The completed dialog box should

375

Next, we need to run the multilevel model. Select

and

specify the contextual variable by selecting Clinic from the list of variables and dragging it to the box labelled Subjects (or click on ).

376

Click on

to move to the main dialog box. First we must specify our outcome

variable, which is quality of life (QoL) after surgery, so select Post_QoL and drag it to the space labelled Dependent variable (or click on ). Next we need to specify our

predictors. Therefore, select Surgery and Base_QoL (hold down Ctrl and you can select both of them simultaneously) and drag them to the space labelled Covariate(s) (or click on ).

The main mixed models dialog box

We need to add the predictors as fixed effect to our model, so click on

, hold

down Ctrl and select Base_QoL and Surgery in the list labelled Factors and Covariates. Then make sure that predictors to the Model. Click on is set to and click on to transfer these

to return to the main dialog box.

377

We now need to ask for a random intercept and random slopes for the effect of Surgery. Click on in the main dialog box. Select Clinic and drag it to the area labelled ). We want to specify that the intercept is random, and we do . Next, select Surgery from the list of Factors and covariates . The other change that we need to make is

Combinations (or click on this by selecting

and add it to the model by clicking on

that we need to estimate the covariance between the random slope and random intercept. This estimation is achieved by clicking on selecting . to access the dropdown list and

378

Click on

and select

. Click on

to return to the main dialog box.

In the main dialog box click on covariance parameter. Click on analysis, click on .

and request Parameter estimates and Tests for to return to the main dialog box. To run the

Use the compute command to transform time into time minus 1.

Access the compute command by selecting

. In the resulting window

enter the name Time into the box labelled Target Variable. Select the variable Time and drag it across to the area labelled Numeric Expression, then click on The completed dialog box is below: and then type 1.

379

Additional Material Oliver Twisted: Please Sir, Can I Have Some More ICC?

I have a dependency on gruel, whines Oliver. Maybe I could measure this dependency if I knew more about the ICC. Well youre so high on gruel Oliver that you have rather missed the point. Still, I did write an article on the ICC once upon a time (Field, 2005) and its reproduced in the additional web material for your delight and amusement.

380

The following article originally appeared in: Field, A. P. (2005). Intraclass correlation. In B. Everitt & D. C. Howell (eds.), Encyclopedia of Behavioral Statistics (Vol. 2, pp. 948954). New York: Wiley.

It appears in adopted form below: Commonly used correlations such as the Pearson product moment correlation measure the bivariate relation between variables of different measurement classes. These are known as interclass correlations. By different measurement classes we really just mean variables measuring different things. For example, we might look at the relation between attractiveness and career success, clearly one of these variables represents a class of measures of how good looking a person is, whereas the other represents the class of measurements of something quite different: how much someone achieves in their career. However, there are often cases in which it is interesting to look at relations between variables within classes of measurement. In its simplest form, we might compare only two variables. For example, we might be interested in whether anxiety runs in families and we could look at this by measuring anxiety within pairs of twins (Eley & Stevenson, 1999). In this case the objects being measured are twins, and both twins are measured on some index of anxiety. As such, there is a pair of variables both measuring anxiety, therefore, from the same class. In such cases, an intraclass correlation (ICC) is used and is commonly extended beyond just two variables to look at the consistency between judges. For example, in gymnastics, ice skating, diving and other Olympic sports,

381

contestants performance is often assessed by a panel of judges. There might be 10 judges, all of whom rate performance out of 10; therefore, the resulting measures are from the same class (they measure the same thing). The objects being rated are the competitors. This again is a perfect scenario of an intraclass correlation.

Models of Intraclass Correlations

There are a variety of different intraclass correlations (McGraw & Wong, 1996; Shrout & Fleiss, 1979) and the first step in calculating one is to determine a model for your sample data. All of the various forms of the intraclass correlation are based on estimates of mean variability from a

one-way repeated measures Analysis of Variance. All situations in which an intraclass correlation is desirable will involve multiple measures on different entities (be they twins, Olympic competitors, pictures, sea slugs etc.). The objects measured constitute a random factor in the design (they are assumed to be random exemplars of the population of objects). The measures taken can be included as factors in the design if they have a meaningful order, or can be excluded if they are unordered as we shall now see. One-Way Random Effects Model In the simplest case we might have only two measures (think back to our twin study on anxiety). When the order of these variables is irrelevant (for example, with our twins it is arbitrary whether we treat the data from the first twin as being anxiety measure 1 or anxiety measure 2). In this case, the only systematic source of variation is the random

382

variable representing the different objects. As such, we can use a one-way ANOVA of the form:

xij = + ri + eij
In which ri is the effect of object i (known as the row effects), j is the measure being considered, and eij is an error term (the residual effects). The row and residual effects are random, independent and normally distributed. Because the effect of the measure is ignored, the resulting intraclass correlation is based on the overall effect of the objects being measured (the mean between-object variability MSRows) and the mean within-object variability (MSW). Both of these will be formally defined later. Two-Way Random Effects Model When the order of measures is important then the effect of the measures becomes important. The most common case of this is when measures come from different judges or raters. Hodgins and Makarchuk (Hodgins & Makarchuk, 2003), for example, show two such uses; in their study they took multiple measures of the same class of behaviour (gambling) but also took measures from different sources. They measured gambling both in terms of days spent gambling and money spent gambling. Clearly these measures generate different data so it is important to which measure a datum belongs (it is not arbitrary to which measure a datum is assigned). This is one scenario in which a two-way model is used. However, they also took measures of gambling both from the gambler and a collateral (e.g. spouse). Again, it is important that we attribute data to the correct source. So, this is a second illustration of where a two-way model is useful. In such

383

situations the intraclass correlation can be used to check the consistency or agreement between measures or raters. In this situation a two-way model can be used as follows:

xij = + ri + c j + rcij + eij


In which cj is the effect of the measure (i.e. the effect of different raters, or different measures), and rcij is the interaction between the measures taken and the objects being measured. The effect of the measure (cj) can be treated as either a fixed-effect or a random-effect. How it is treated doesnt affect the calculation of the intraclass correlation, but it does affect the interpretation (as we shall see). It is also possible to exclude the interaction term and use the model:

x ij = + ri + c j + eij
We shall now turn our attention to calculating the sources of variance needed to calculate the intraclass correlation.
Sources of Variance: An Example

In the chapter in the book on repeated measures ANOVA, there is an example relating to student concerns about the consistency of marking between lecturers. It is common that lecturers obtain reputations for being hard or light markers which can lead students to believe that their marks are not based solely on the intrinsic merit of the work, but can be influenced by who marked the work. To test this we could calculate an intraclass correlation. First, we could submit the same 8 essays to four different lecturers and record the mark they gave each essay. Table 1 shows the data, and you should note that it looks

384

the same as a one-way repeated measures ANOVA in which the four lecturers represent 4 levels of an independent variable and the outcome or dependent variable is the mark given (in fact I use these data as an example of a one-way repeated measures ANOVA). Table 1
Dr. Essay Dr. Field Smith Scrote Dr. Dr. Death Mean S2 S2(k-1)

1 2 3 4 5 6 7 8
Mean:

62 63 65 68 69 71 78 75
68.88

58 60 61 64 65 67 66 73
64.25

63 68 72 58 54 65 67 75
65.25

64 65 65 61 59 50 50 45
57.38

61.75 64.00 65.75 62.75 61.75 63.25 65.25 67.00


63.94

6.92 11.33 20.92 18.25 43.58 84.25 132.92 216.00


Total:

20.75 34.00 62.75 54.75 130.75 252.75 398.75 648.00


1602.50

There are three different sources of variance that are needed to calculate an intraclass correlation which we shall now calculate. These sources of variance are the same as those calculated in one-way repeated measures ANOVA. (If you dont believe me consult Smart Alexs answers to chapter 13 to see an identical set of calculations!)

385

The Between-Object Variance (MSRows) The first source of variance is the variance between the objects being rated (in this case the between-essay variance). Essays will naturally vary in their quality for all sorts of reasons (the natural ability of the author, the time spent writing the essay etc.). This variance is calculated by looking at the average mark for each essay and seeing how much it deviates from the average mark for all essays. These deviations are squared because some will be positive and others negative and so would cancel out when summed. The squared errors for each essay are weighted by the number of values that contribute to the mean (in this case the number of different markers, k). So, in general terms we write this as:

SS Rows = k i (X Row i X all rows )


n i =1

Or, for our example we could write it as:

SS Essays = k i (X Essay i X all essays )


n i =1

This would give us: SSRows = 4(61.75 63.94)2 + 4(6400 63.94)2 + 4(65.75 63.94)2 + 4(62.75 63.94)2 + K + 4(61.75 63.94)2 + 4(63.25 63.94)2 + 4(65.25 63.94)2 + 4(67.00 63.94)2 = 19.18 + 0.014 + 13.10 + 5.66 + 19.18 + 1.90 + 6.86 + 37.45 = 103.34 This sum of squares is based on the total variability and so its size depends on how many objects (essays in this case) have been rated. Therefore, we convert this total to an average known as the mean squared error (MS) by dividing by the number of essays (or

386

in general terms the number of rows) minus 1. This value is known as the degrees of
freedom.

MS Rows =

SS Rows 103.34 103.34 = = = 14.76 df Rows n 1 7

The mean squared error for the rows in the table is our estimate of the natural variability between the objects being rated. The Within-Judge Variability (MSW) The second variability in which were interested is the variability within measures/judges. To calculate this we look at the deviation of each judge from the average of all judges on a particular essay. We use an equation with the same structure as before, but for each essay separately:

SS Essay = (X Column k X all columns )


p k =1

For essay 1, for example, this would be: SS Essay = (62 61.75) + (58 61.75) + (63 61.75) + (64 61.75) = 20.75
2 2 2 2

The degrees of freedom for this calculation is again one less than the number of scores used in the calculation. In other words it is the number of judges, k, minus 1. We have to calculate this for each of the essays in turn and then add these values up to get the total variability within judges. An alternative way to do this is to use the variance within each essay. The equation mentioned above is equivalent to the variance for each essay multiplied by the number of values on which that variance is based (in this case the number of judges, k) minus 1. As such we get:
387

2 2 2 2 SSW = s essay 1 (k1 1) + s essay 2 (k 2 1) + s essay 3 (k 3 1) + K + s essayn (k n 1)

Table 1 shows the values for each essay in the last column. When we sum these values

we get 1602.50. As before, this value is a total and so depends on the number essays (and the number of judges). Therefore, we convert it to an average, by dividing by the degrees
of freedom. For each essay we calculated a sum of squares that we saw was based on k1

degrees of freedom. Therefore, the degrees of freedom for the total within-judge variability are the sum of the degrees of freedom for each essay: df W = n(k 1) In which n is the number of essays and k is the number of judges. In this case it will be 8(41) = 24. The resulting mean squared error is, therefore:

MSW =

SSW 1602.50 1602.50 = = = 66.77 df W n(k 1) 24

The Between-Judge Variability (MSColumns) The within-judge or within-measure variability is made up of two components. The first is the variability created by differences between judges. The second is unexplained variability (error for want of a better word). The variability between judges is again calculated using a variant of the same equation that weve used all along only this time were interested in the deviation of each judges mean from the mean of all judges:

SS Columns = ni (X Column i X all columns )


p k =1

Or:
388

SS Judges = ni (X Judge i X all Judges )


p k =1

In which n is the number of things that each judge rated. For these data wed get:

SS Columns = 8(68.88 63.94) 2 + 8 (64.25 63.94) 2 + 8 (65.25 63.94) 2 + 8 (57.38 63.94) 2 = 554
The degrees of freedom for this effect are the number of judges, k, minus 1. As before, the sum of squares is converted to a mean squared error by dividing by the degrees of freedom:

MS Columns =

SS Columns 554 554 = = = 184.67 df Columns k 1 3

The Error Variability (MSE) The final variability is the variability that cant be explained by known factors such as variability between essays or judges/measures. This can be easily calculated using subtraction because we know that the within-judges variability is made up of the betweenjudges variability and this error: SS W = SS Columns + SS E SS E = SS W SS Columns The same is true of the degrees of freedom: df W = df Columns + df E df E = df W df Columns So, for these data we get:

389

SS E = SS W SS Columns = 1602.50 554 = 1048.50 and: df E = df W df Columns = 24 3 = 21 We get the average error variance in the usual way:

MS E =

SS E 1048.50 = = 49.93 df E 21

Calculating Intraclass Correlations

Having computed the necessary variance components, we shall now look at how the intraclass correlation is calculated. Before we do so, however, there are two important decisions to be made. Single Measures of Average Measures So far we have talked about situations in which the measures weve used produce single values. However, it is possible that we might have measures that produce an average score. For example, we might get judges to rate paintings in a competition based on style, content, originality, and technical skill. For each judge, their ratings are averaged. The end result is still ratings from a set of judges, but these ratings are an average of many ratings. Intraclass correlations can be computed for such data, but the computation is somewhat different. Consistency or Agreement?

390

The next decision involves whether you want a measure of overall consistency between measures/judges. The best way to explain this distinction is to return to our lecturers marking essays. It is possible that particular lecturers are harsh in their ratings (or lenient). A consistency definition views these differences as an irrelevant source of variance. As such the between-judge variability described above (MSColumns) is ignored in the calculation (see Table 2). In ignoring this source of variance we are getting a measure of whether judges agree about the relative merits of the essays without worrying about whether the judges anchor their marks around the same point. So, if all the judges agree that essay 1 is the best, essay 5 is the worst (or their rank order of essays is roughly the same) then agreement will be high: it doesnt matter that Dr. Fields marks are all 10% higher than Dr. Deaths. This is a consistency definition of agreement. The alternative is to treat relative differences between judges as an important source of disagreement. That is, the between-judge variability described above (MSColumns) is treated as an important source of variation and is included in the calculation (see Table 2). In this scenario disagreements between the relative magnitude of judges ratings matters (so, the fact that Dr. Deaths marks differ from Dr. Fields will matter even if their rank order of marks is in agreement). This is an absolute agreement definition. By definition the one-way model ignores the effect of the measures and so can have only this kind of interpretation. Equations for ICCs Table 2 shows the equations for calculating ICC based on whether a one-way or two-way model is assumed and whether a consistency or absolute agreement definition is preferred. For illustrative purposes, the ICC is calculated in each case for the example
391

used in this entry. This should enable the reader to identify how to calculate the various sources of variance. In this table MSColumns is abbreviated to MSC and MSRows is abbreviated to MSR.
Table 2:

ICC for Single Scores Model Interpretation Oneway Absolute Agreement Consistency TwoWay Absolute Agreement Equation
MS R MS W MS R + (k 1)MS W

ICC for example data


14.76 66.77 = 0.24 14.76 + (4 1)66.77

MS R MS E MS R + (k 1)MS E
MS R MS E k MS R + (k 1)MS E + (MS C MS E ) n

14.76 49.93 = 0.21 14.76 + (4 1)49.93

14.76 49.93 = 0.15 4 14.76 + (4 1)49.93 + (184.67 49.93) 8

ICC for Average Scores Oneway Absolute Agreement Consistency TwoWay Absolute Agreement
MS R MS W MS R 14.76 66.77 = 3.52 14.76

MS R MS E MS R

14.76 49.93 = 2.38 14.76 14.76 49.93 = 1.11 184.67 49.93 14.76 + 8

MS R MS E MSC MS E MS R + n

392

Significance Testing

The calculated intraclass correlation can be tested against a value under the null hypothesis using a standard F-test (see analysis of variance). McGraw and Wong (McGraw & Wong, 1996) describe these tests for the various intraclass correlations weve seen and Table 3 summarises their work. In this table ICC is the observed intraclass correlation whereas 0 is the value of the intraclass correlation under the null hypothesis. That is, its the value against which you wish to compare the observed intraclass correlation. So, replace this value with 0 to test the hypothesis that the observed ICC is greater than zero, but replace it with other values such as 0.1, 0.3 or 0.5 to test that the observed ICC is greater than know values of small medium and large effect sizes respectively.
Table 3:

ICC for Single Scores Model Interpretation Oneway Absolute Agreement Consistency TwoWay Absolute Agreement
MS R aMSC + bMS E n1

F-ratio
1 0 MS R MS W 1 + (k 1) 0

Df1

Df2

n1

n(k 1)

1 0 MS R MS E 1 + (k 1) 0

n1

(n 1)(k 1)

(aMSC + bMS E )2 (aMSC )2 (bMS E )2


k1 +

In which;

(n 1)(k 1)

393

k 0 n(1 0 ) k (n 1) b = 1+ 0 n(1 0 ) a=

ICC for Average Scores Oneway Absolute Agreement Consistency


1 0 1 ICC

n1

n(k 1)

1 0 1 ICC

n1

(n 1)(k 1)

TwoWay Absolute Agreement

MS R cMSC + dMS E

In which;
n1

(cMSC + dMS E )2 (cMSC )2 (dMS E )2


k1 +

c=

n(1 0 ) (n 1) b = 1+ 0 n(1 0 )

(n 1)(k 1)

Fixed versus Random Effects

I mentioned earlier on that the effect of the measure/judges can be conceptualised as a fixed or random effect. Although it makes no difference to the calculation it does affect the interpretation. Essentially, this variable should be regarded as random when the judges or measures represent a sample of a larger population of measures or judges that could have been used. Put another way, the particular judges or measures chosen are not important and do not change the research question youre addressing. However, the effect of measures should be treated as fixed when changing one of the judges or measures

394

would significantly affect the research question (see fixed and random effects). For example, in the gambling study mentioned earlier it would make a difference if the ratings of the gambler were replaced: the fact the gamblers gave ratings was intrinsic to the research question being addressed (do gamblers give accurate information about their gambling?). However, in our example of lecturers marks, it shouldnt make any difference if we substitute one lecturer with a different one: we can still answer the same research question (do lecturers, in general, give inconsistent marks?). In terms of interpretation, when the effect of the measures is a random factor then the results can be generalized beyond the sample; however, when they are a fixed effect, any conclusions apply only to the sample on which the ICC is based (McGraw & Wong, 1996).
Oliver Twisted: Please Sir, Can I Have Some More Centring?

Recentgin, babbles Oliver as he stumbles drunk out of Mrs Moonshines Alcohol Emporium, I need some more recent gin. I think you mean centring Oliver, not recentgin. If you want to know how to centre your variables using SPSS, then the additional material for this chapter on the companion website will tell you. Well use the Cosmetic Surgery.sav data to illustrate the two types of centring discussed in the book chapter. Load this file into SPSS. Lets assume that we want to centre the variable BDI.
Grand Mean Centring

395

Grand mean centring is really easy time we can simply use the compute command that we encountered in the book. First, we need to find out the mean score for BDI. We can do this using some simple descriptive statistics. Chose

to access the dialog box below. Select BDI and drag it to the box labelled Variable(s), then click on dont need any other information). and select only the mean (we

The resulting output tells us that the mean is 23.05:

We use this value to centre the variable. Access the compute command by selecting . In the resulting dialog box, Enter the name BDI_Centred into the box labelled Target Variable and then click on and give the variable a more

descriptive name if you want to. Select the variable BDI and drag it across to the area

396

labelled Numeric Expression, then click on (23.05). The completed dialog box is below:

and then type the value of the mean

Click on

and a new variable will be created called BDI_Centred which is centred

around the mean of BDI. The mean of this new variable should be approximately 0: run some descriptive statistics to see that this is true. You can do the same thing in a syntax window by typing:

COMPUTE BDI_Centred = BDI-23.05.

397

EXECUTE.

Group Mean Centring

Group mean centring is considerably more complicated. The first step is to create a file containing the means of the groups. Lets try this again for the BDI scores. We want to centre this variable across the level 2 variable of Clinic. We first need to know the mean BDI in each group and to save that information in a form that SPSS can use later on. To do this we need to use the aggregate command, which is not discussed in the book. To access the main dialog box select . In this dialog box we want to select Clinic

and drag it to the area labelled Break variable. This will mean that the variable clinic is used to split up the data file (in other words, when the mean is computed it will do it for each clinic separately). We then need to select BDI and drag it to the area labelled Summaries of variable(s). Youll notice that once this variable is selected the default is that SPSS will create a new variable called BDI_mean, which is the mean of BDI (split by clinic, obviously). We need to save this information in a file that we can access later on, so select . By default, SPSS will save

the file with the name aggr.sav in your default directory. If you would like to save it elsewhere or under a different name then click on to open a normal file system

dialog box where you can name the file and navigate to a directory that youd like to save it in. Click on to create this new file.

398

If you open the resulting data file (you dont need to, but it will give you an idea of what it contains) you will see that it simply contains two columns, one with a number specifying the clinic from which the data came (there were 10 clinics) and the second containing the mean BDI score within each clinic.

When SPSS creates the aggregated data file it orders the clinics from lowest to highest (regardless of what order they are in the data set). Therefore, to make our working data file match this aggregated file, we need to make sure that all of the data from the various clinics are ordered too from clinic 1 up to clinic 10. This is easily done by using the sort

399

cases command. (Actually our data are already ordered in this way, but because your data might not always be, well go through the motions anyway.) To access the Sort cases command select . Select the variable that you want to sort the file by (in this ). You can choose to

case Clinic) and drag it to the area labelled Sort by (or click on

order the file in ascending order (clinic 1 to clinic 10), which is what we need to do here, or descending order (clinic 10 to clinic 1). Click on to sort the file.

The next step is to use these clinic means in the aggregated file to centre the BDI variable in our main file. To do this we need to use the match files command, which can be accessed by selecting . This will open a dialog box

that lists all of open data files (in my case I had none open apart from the one that I was working from, so this space is blank) or asks you to select an SPSS data file. Click on and navigate to wherever you decided to store the file of aggregated values (in my case aggr.sav). Select this file, then click to move on to the next dialog box. to return to the dialog box. Then click on

400

In the next dialog box we need to match the two files, which just tells SPSS that the two files are connected. To do this click on . Then we also need

to specifically connect the files on the Clinic variable. To do this select , which tells SPSS that the data set that isnt active (i.e. the file of aggregated scores) should be treated as a table of values that are matched to the working data file on a key variable. We need to select what this key variable is. We want to match the files on the Clinic variable, so select this variable in the Excluded variables list and drag it to the space labelled Key Variables (or click on ). Click on .

401

The data editor should now include a new variable, BDI_Mean, which contains the values from our file aggr.sav. Basically, SPSS has matched the files for the clinic variable, so that the values in BDI_Mean correspond to the mean value for the various clinics. So, when the clinic variable is 1, BDI_mean has been set as 25.19, but when clinic is 2, BDI_Mean is set to 31.32. We can use these values in the compute command again to centre BDI. Access the compute command by selecting . In the

resulting dialog box enter the name BDI_Group_Centred into the box labelled Target Variable and then click on and give the variable a more descriptive name if you

want to. Select the variable BDI and drag it across to the area labelled Numeric Expression, then click on and then either type BDI_Mean or select this variable and and a new variable will be

drag it to the box labelled Target Variable. Click on created containing the group centred means.

402

Alternatively you can do this all with the following syntax:

AGGREGATE /OUTFILE='C:\Users\Dr. Andy Field\Documents\Academic\Data\aggr.sav' /BREAK=Clinic /BDI_mean=MEAN(BDI). SORT CASES BY Clinic(A). MATCH FILES /FILE=* /TABLE='C:\Users\Dr. Andy Field\Documents\Academic\Data\aggr.sav' /BY Clinic. EXECUTE. COMPUTE BDI_Group_Centred=BDI - BDI_mean. EXECUTE.

Labcoat Lenis Real Research: A Fertile Gesture

Miller, Tybur & Jordan (2007). Evolution and Human Behavior, 28, 375381.

403

Most female mammals experience a phase of estrus during which they are more sexually receptive, proceptive, selective and attractive. As such, the evolutionary benefit to this phase is believed to be to attract mates of superior genetic stock. However, some people have argued that this important phase became uniquely lost or hidden in human females. Testing these evolutionary ideas is exceptionally difficult but Geoffrey Miller and his colleagues came up with an incredibly elegant piece of research that did just that. They reasoned that if the hidden-estrus theory is incorrect then men should find women most attractive during the fertile phase of their menstrual cycle compared to the prefertile (menstrual) and post-fertile (luteal) phase. To measure how attractive men found women in an ecologically valid way, they came up with the ingenious idea of collecting data from women working at lapdancing clubs. These women maximize their tips from male visitors by attracting more dances. In effect the men try out several dancers before choosing a dancer for a prolonged dance. For each dance the male pays a tip, therefore the more men that chose a particular woman, the more her earnings will be. As such, each dancers earnings are a good index of how attractive the male customers have found her. Miller and his colleagues argued, therefore, that if women do have an estrus phase then they will be more attractive during this phase and therefore earn more money. This study is a brilliant example of using a realworld phenomenon to address an important scientific question in an ecologically valid way. The data for this study are in the file Miller et al. (2007).sav. The researchers collected data via a website from several dancers (ID), who provided data for multiple lapdancing shifts (so for each person there are several rows of data). They also measured what phase

404

of their menstrual cycle the women were in at a given shift (Cyclephase), and whether they were using hormonal contraceptives (Contraceptive) because this would affect their cycle. The outcome was their earnings on a given shift in dollars (Tips). A multilevel model can be used here because the data are unbalanced: each woman differed in the number of shifts they provided data for (the range was 9 to 29 shifts), and there were missing data for Cyclephase. Multilevel models can handle these problems with ease. Labcoat Leni wants you to carry out a multilevel model with to see whether Tips can be predicted from Cyclephase, Contraceptive and their interaction. Is the estrus-hidden hypothesis supported? Answers are in the additional material on the companion website (or look at page 378 in the original article).

First, select

; in this initial dialog box we need to set

up the level 2 variable. In this example, multiple scores or shifts are nested within each dancer. Therefore, the level 2 variable is the participant (the lap dancer) and this variable is represented by the variable labelled ID. Select this variable and drag it to the box labelled Subjects (or click on ). Click on to access the main dialog box.

405

In the main dialog box we need to set up our predictors and outcome. The outcome was the value of tips earned, so select Tips and drag it to the box labelled Dependent variable (or click on ). We also have two predictors: Cyclephase and Contraceptive. Select

both of these (click on one and then while holding down Ctrl click on the other) and then drag them to the box labelled Factor(s), or click on both variables are categorical. . We use the Factor(s) box because

406

We need to add these fixed effects to our model, so click on

to bring up the fixed

effects dialog box. To specify both main effects and the interaction term, select both predictors (click on Cyclephase and then while holding down Ctrl click on
Contraceptive), then select

, and then click on

. With

selected

you should find that both main effects and the interaction term are transferred to the Model. Click on to return to the main dialog box.

In the model that Miller et al. fitted, they did not assume that there would be random slopes (i.e. the relationship between each predictor and tips was not assumed to vary within lap dancers). This decision is appropriate for Contraceptive because this variable didnt vary at level 2 (the lap dancer was either taking contraceptives or not, so this could not be set up as a random effect because it doesnt vary over our level 2 variable of participant). Also, because Cyclephase is a categorical variable with three unordered categories we could not expect a linear relationship with tips: we expect tips to vary over
407

categories but the categories themselves have no meaningful order. However, we might expect tips to vary over participants (some lap dancers will naturally get more money than others) and we can factor this variability in by allowing the intercept to be random. As such, were fitting a random intercept model to the data. To do this click on in the main dialog box to access the dialog box below. The

first thing we need to do is to specify our contextual variable. We do this by selecting it from the list of contextual variables that we have told SPSS about already. These appear in the section labelled Subjects and because we only specified one variable, there is only one variable in the list, ID. Select this variable and drag it to the area labelled Combinations (or click on we do this by selecting ). We want to specify only that the intercept is random, and . Notice in this dialog box that there is a dropdown list ). For a random intercept model this to return to the main dialog box.

to specify the type of covariance ( default option is fine. Click on

408

The authors report in the paper that they used restricted maximumlikelihood estimation (REML), so click on and select this option. Finally, click on and select to return to the

Parameter estimates and Tests for covariance parameters. Click on main dialog box. To run the analysis, click on .

This first table tells us our fixed effects. As you can see they are all significant. Miller and colleagues reported these results as follows:

409

Main effects of cycle phase [F(2, 236)=27.46, p < .001] and contraception use [F(1, 17)=6.76, p < .05] were moderated by an interaction between cycle phase and pill use [F(2, 236)=5.32, p < .01]. (p. 378) Hopefully you can see where these values come from in the table (they rounded the df off to whole numbers). Basically this shows that the phase of the dancers cycle significantly predicted tip income and this interacted with whether or not the dancer was having natural cycles or was on the contraceptive pill. However, we dont know which groups differed. We can use the parameter estimates to tell us:

I coded Cyclephase in a way that would be most useful for interpretation, which was to code the group of interest (fertile period) as the last category (2), and the other phases as 1 (Luteal) and 0 (Menstrual). The parameter estimates for this variable, therefore, compare each category against the last category, and because I made the last category the fertile phase this means we get a comparison of the fertile phase against the other two. Therefore, we could say (because the b is negative) that tips were significantly higher in

410

the fertile phase than in the menstrual phase, b = 100.41, t(235.21) = 6.11, p < .001, and in the luteal phase, b = 170, t(234.92) = 9.84, p < .001. The beta, as in regression, tells us the change in tips as we shift from one group to another, so during the fertile phase, dancers earned about $100 more than during the menstrual phase, and $170 more than the luteal phase. These effects dont factor in the contraceptive use. To look at this we need to look at the contrasts for the interaction term. The first of these tells us the following: if we worked out the relative difference in tips between the fertile phase and the menstrual phase, how much more do those in their natural cycle earn compared to those on contraceptive pills? The answer is about $86. In other words, there is a combined effect of being in a natural cycle and being in the fertile phase and this is significant, b = 86.09, t(237) = 2.86, p < .01. The second contrast tells us the following: if we worked out the relative difference in tips between the fertile phase and the luteal phase, how much more do those in their natural cycle earn compared to those on contraceptive pills? The answer is about $90 (the b). In other words, there is a combined effect of being in a natural cycle and being in the fertile phase compared to the luteal phase and this is significant, b = 89.94, t(236.80) = 2.63, p < .01.

411

The final table is not central to the hypotheses, but it does tell us about the random intercept. In other words, it tells us whether tips (in general) varied from dancer to dancer. The variance in tips across dancers was 3571.12, and this is significant, z = 2.37, p < .05. In other words, the average tip per dancer varied significantly. This confirms that we were justified in treating the intercept as a random variable. To conclude then, this study showed that the estrus-hidden hypothesis is wrong: men did find women more attractive (as indexed by how many lap dances they did and therefore how much they earned) during the fertile phase of their cycle compared to the other phases.

References

Board, B. J., & Fritzon, K. (2005). Disordered personalities at work. Psychology, Crime & Law, 11(1), 1732. etinkaya, H., & Domjan, M. (2006). Sexual fetishism in a quail (Coturnix japonica) model system: Test of reproductive success. Journal of Comparative Psychology, 120(4), 427432. Chamorro-Premuzic, T., Furnham, A., Christopher, A. N., Garwood, J., & Martin, N. (2008). Birds of a feather: Students preferences for lecturers personalities as predicted by their own personality and learning approaches. Personality and Individual Differences, 44, 965976.

412

Davidson, M.L.(1972). Univariate versus multivariate tests in repeated-measures experiments. Psychological Bulletiu, 77, 446452. Davey, G. C. L., Startup, H. M., Zara, A., MacDonald, C. B., & Field, A. P. (2003). Perseveration of checking thoughts and mood-as-input hypothesis. Journal of Behavior Therapy & Experimental Psychiatry, 34, 141160. Eley, T. C., & Stevenson, J. (1999). Using genetic analyses to clarify the distinction between depressive and anxious symptoms in children. Journal of Abnormal Child Psychology, 27(2), 105114. Fesmire, F. M. (1988). termination of intractable hiccups with digital rectal massage. Annals of Emergency Medicine, 17(8), 872872. Field, A. P. (2005). Intraclass correlation. In B. Everitt & D. C. Howell (Eds.), Encyclopedia of Behavioral Statistics (Vol. 2, pp. 948954). New York: Wiley. Field, A. P. (2006). The behavioral inhibition system and the verbal information pathway to children's fears. Journal of Abnormal Psychology, 115(4), 742752. Gallup, G. G. J., Burch, R. L., Zappieri, M. L., Parvez, R., Stockwell, M., & Davis, J. A. (2003). The human penis as a semen displacement device. Evolution and Human Behavior, 277289. Gaskell, G. D., Wright, D. B., & OMuircheartaigh, C. A. (1993). Reliability of surveys. The Psychologist, 6 (11), 500503. Hodgins, D. C., & Makarchuk, K. (2003). Trusting problem gamblers: Reliability and validity of self- reported gambling behavior. Psychology of Addictive Behaviors, 17(3), 244248. 24,

413

Lacourse, E., Claes, M., & Villeneuve, M. (2001). Heavy metal music and adolescent suicidal risk. Journal of Youth and Adolescence, 30(3), 321332. Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov tests for normality with mean and variance unknown. Journal of the American Statistical Association, 62, 399402. Nichols, L. A., & Nicki, R. (2004). Development of a psychometrically sound internet addiction scale: A preliminary step. Psychology of Addictive Behaviors, 18(4), 381384. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 3046. Massey, F. J. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 46, 6878. Mathews, R.C., Domjan, M.Ramsay, M. and crews, (2007). Learning effects on Sperm competition and reproductive F. tress. Psychological Science, 18(9), 758762. Marzillier, S.L. and Davey,g.c.L.(2005). Anxiety and disgust: Evidence unidirectional. Cognition and Emotion, 19(5), 729750. Muris, P.Huijding, J.Mayer, B. and Hameetman, M.(2008). A space odyssey: Experimental manipulation of threat perception and anxietyrelatd interpretation bias in children. Child Psychiatry and Human Development, 39(4), 469480. Schtzwohl, A. (2008). The disengagement of attentive resources from task-irrelevant cues to sexual and emotional infidelity. Personality and Individual Differences, 44, 633-644. Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52, 591611.

414

Shapiro, S. S., Wilk, M. B., & Chen, H. J. (1968). A comparative study of various tests for normality. Journal of the American Statistical Association, 63, 13431372. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing reliability. Psychological Bulletin, 86, 420428. Stack, S., & Gundlach, J. (1992). The effect of country music on suicide. Social Forces, 71, 211218.

An analysis of the untransformed scores using a non-parametric test (Friedmans ANOVA) also revealed significant differences between approach times to the boxes, 2(2) = 140.36, p < .001.

415

Smart Alexs Answers

Chapter 1 Task 1 What are (broadly speaking) the five stages of the research process? Generating a research question: through an initial observation (hopefully backed up by some data). Generate a theory to explain your initial observation. Generate hypotheses: break your theory down into a set of testable predictions. Collect data to test the theory: decide on what variables you need to measure to test your predictions and how best to measure or manipulate those variables. Analyse the data: look at the data visually and by fitting a statistical model to see if it supports your predictions (and therefore your theory). At this point you should return to your theory and revise it if necessary. Task 2 What is the fundamental difference between experimental and correlational research? In a word, causality. In experimental research we manipulate a variable (predictor, independent variable) to see what effect it has on another variable (outcome, dependent variable). This manipulation, if done properly, allows us to compare situations where the causal factor is present to situations where it is absent. Therefore, if there are differences between these situations, we can

attribute cause to the variable that we manipulated. In correlational research, we measure things that naturally occur and so we cannot attribute cause but instead look at natural covariation between variables. Task 3 What is the level of measurement of the following variables? The number of downloads of different bands songs on iTunes: o This is a discrete ratio measure. It is discrete because you can download only whole songs, and it is ratio because it has a true value of 0 (no downloads at all). The names of the bands downloaded. o This is a nominal variable. Bands can be identified by their name, but the names have no meaningful order. That fact that Norwegian black metal band 1349 called themselves 1349 does not make them better than British boy-band has-beens 911; the fact that 911 were a bunch of talentless idiots does, though. The position in the iTunes download chart. o This is an ordinal variable. We know that the band at number 1 sold more than the band at number 2 or 3 (and so on) but we dont know how many more downloads they had. So, this variable tells us the order of magnitude of downloads, but doesnt tell us how many downloads there actually were. The money earned by the bands from the downloads.
2

o This variable is continuous and ratio. It is continuous because money (pounds, dollars, euros or whatever) can be broken down into very small amounts (you can earn fractions of euros even though there may not be an actual coin to represent these fractions). The weight of drugs bought by the band with their royalties. o This variable is continuous and ratio. If the drummer buys 100 g of cocaine and the singer buys 1 kg, then the singer has 10 times as much. The type of drugs bought by the band with their royalties. o This variable is categorical and nominal: the name of the drug tells us something meaningful (crack, cannabis, amphetamine, etc.) but has no meaningful order. The phone numbers that the bands obtained because of their fame. o This variable is categorical and nominal too: the phone numbers have no meaningful order; they might as well be letters. A bigger phone number did not mean that it was given by a better person. The gender of the people giving the bands their phone numbers. o This variable is categorical and binary: the people dishing out their phone numbers could fall into one of only two categories (male or female). The instruments played by the band members.

o This variable is categorical and nominal too: the instruments have no meaningful order but their names tell us something useful (guitar, bass, drums etc.). The time, they had spent learning to play their instruments. o This is a continuous and ratio variable. The amount of time could be split into infinitely small divisions (nanoseconds even) and there is a meaningful true zero (0 time spent learning your instrument means that, like 911, you cant play at all). Task 4 Say I own 857 CDs. My friend has written a computer program that uses a webcam to scan my shelves in my house where I keep my CDs and measure how many I have. His program says that I have 863 CDs. Define measurement error. What is the measurement error in my friends CD counting device? 1. Measurement error is the difference between the true value of something and the numbers used to represent that value. In this trivial example, the measurement error is 6 CDs. In this example we know the true value of what were measuring; usually we dont have this information so we have to estimate this error rather than knowing its actual value. Task 5 Sketch the shape of a normal distribution, a positively skewed distribution and a negatively skewed distribution. Normal:
4

Positive skew:

Negative skew:

Chapter 2 Task 1: Why do we use samples?

We are usually interested in populations, but because we cannot collect data from every human being (or whatever) in the population, we collect data from a small subset of the population (known as a sample) and use these data to infer things about the population as a whole. Task 2: What is the mean and how do we tell if its representative of our data? The mean is a simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the typical score. We use the variance, or standard deviation, to tell us whether it is representative of our data. The standard deviation is a measure of how much error there is associated with the mean: a small standard deviation indicates that the mean is a good representation of our data. Task 3: Whats the difference between the standard deviation and the standard error? The standard deviation tells us how much observations in our sample differ from the mean value within our sample. The standard error tells us not about how the sample mean represents the sample itself, but how well the sample mean represents the population mean. The standard error is the standard deviation of the sampling distribution of a statistic. For a given statistic (e.g. the mean) it tells us how much variability there is in this statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came. Task 4: In Chapter 1 we used an example of the time taken for 21 heavy smokers to fall off of a treadmill at the fastest setting (18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34,

36, 36, 43, 42, 49, 46, 46, 57). Calculate the sums of squares, variance, standard deviation and standard error of these data. To calculate the sum of squares, take the mean from each value, then square this difference. Finally, add up these squared values:

So, the sum of squared errors is a massive 2685.24. The variance is the sum of squared errors divided by the degrees of freedom (N1). There were 21 scores and so the degrees of freedom were 10. The variance is, therefore, 2685.24/20 = 134.26. The standard deviation is the square root of the variance: 134.26 = 11.59. The standard error will be:

s N

11.59 21

= 2.53

The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, N 1. With 21 data points, the degrees of freedom are 20. For a 95% confidence interval we can look up the value in the column labelled Two-Tailed Test, 0.05 in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.09. The confidence intervals are therefore: Lower Boundary of Confidence Interval = X (2.09 SE ) = 32.19 (2.09 2.53) = 26.90 Upper Boundary of Confidence Interval = X (2.09 SE ) = 32.19 + (2.90 2.53) = 37.48 Task 5: What do the sum of squares, variance and standard deviation represent? How do they differ? All of these measures tell us something about how well the mean fits the observed sample data. Large values (relative to the scale of measurement) suggest the mean is a poor fit of the observed scores, and small values suggest a good fit. They are also, therefore, measures of dispersion with large values indicating a spread-out distribution of scores and small values showing a more tightly packed distribution. These measures all represent the same thing, but differ in how they express it. The sum of squared errors is a total and is, therefore, affected by the number of data points. The variance is the average variability but units squared. The standard deviation is the average variation but converted back to the original units of measurement. As such, the size of the standard

deviation can be compared to the mean (because they are in the same units of measurement). Task 6: What is a test statistic and what does it tell us? A test statistic is a statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses, or to establish whether a model is a reasonable representation of whats happening in the population. Task 7: What are Type I and Type II errors? A Type I error occurs when we believe that there is a genuine effect in our population, when in fact there isnt. A Type II error occurs when we believe that there is no effect in the population when, in reality, there is. Task 8: What is an effect size and how is it measured? An effect size is an objective and standardized measure of the magnitude of an observed effect. Measures include Cohens d, the odds ratio and Pearsons correlations coefficient, r. Task 9: What is statistical power? Power is the ability of a test to detect an effect of a particular size (a value of 0.8 is a good level to aim for). Chapter 3 Task 2 Your second task is to enter the data that I used to create Figure 3.10. These data show the score (out of 20) for 20 different students some of whom are male and
10

some female, and some of whom were taught using positive reinforcement (being nice) and others who were taught using punishment (electric shock). Just to make it hard, the data should not be entered in the same way that they are laid out below. The data can be found in the file Method of Teaching.sav and should look like this:

Or with the value labels off, like this:

11

Task 3 Research has looked at emotional reactions to infidelity and found that men get homicidal and suicidal and women feel undesirable and insecure (Shackelford, LeBlanc, and Drass, 2000). Lets imagine we did some similar research: we took some men and women and got their partners to tell them they had slept with someone else. We then took each person to two shooting galleries and each time gave them a gun and 100 bullets. In one gallery was a human-shaped target with a picture of their own face on it, and in the other was a target with their partners face on it. They were left alone with each target for 5 minutes and the number of bullets used was measured. The data are below, enter them into SPSS. (Clue:

12

They are not entered in the format in the table!) The data can be found in the file Infidelity.sav and should look like this:

Or with the value labels off, like this:

13

Chapter 4 Task 1 Using the data from Chapter 2 (which you should have saved, but if you didnt re-enter it) plot and interpret the following graphs: An error bar chart showing the mean number of friends for students and lecturers. An error bar chart showing the mean alcohol consumption for students and lecturers.

14

An error line chart showing the mean income for students and lecturers. An error line chart showing the mean neuroticism for students and lecturers. A scatterplot (with regression lines) of alcohol consumption and neuroticism grouped by lecturer/student.

A scatterplot matrix of alcohol consumption, neuroticism and number of friends.

An error bar chart showing the mean number of friends for students and lecturers. First of all access the Chart Builder and select a simple bar chart. The y-axis needs to be the dependent variable, or the thing youve measured, or more simply the thing for which you want to display the mean. In this case it would be number of friends, so select this variable from the variable list and drag it into the y-axis drop zone ( ). The

x-axis should be the variable by which we want to split the arousal data. To plot the means for the students and lecturers, select the variable Group from the variable list and drag it into the drop zone for the x-axis ( ). Then add error bars by selecting

in the Element Properties dialog box. The finished Chart Builder will look like this:

15

The error bar chart will look like this:

16

We can conclude that, on average, students had more friends than lecturers. An error bar chart showing the mean alcohol consumption for students and lecturers. Access the Chart Builder and select a simple bar chart. The y-axis needs to be the thing weve measured, which in this case is alcohol consumption, so select this variable from the variable list and drag it into the y-axis drop zone ( ). The x-axis should be

the variable by which we want to split the arousal data. To plot the means for the students and lecturers, select the variable Group from the variable list and drag it into the drop zone for the x-axis ( ). Add error bars by selecting in the Element

Properties dialog box. The finished Chart Builder will look like this:

The error bar chart will look like this:


17

We can conclude that, on average, students and lecturers drank similar amounts, but the error bars tells us that the mean is a better representation of the population for students than for lecturers (there is more variability in lecturers drinking habits compared to students). An error line chart showing the mean income for students and lecturers. Access the Chart Builder and select a simple line chart. The y-axis needs to be the thing weve measured, which in this case is income, so select this variable from the variable list and drag it into the y-axis drop zone ( ). The x-axis should again be students vs. lecturers, so

select the variable Group from the variable list and drag it into the drop zone for the x-axis ( ). Add error bars by selecting in the Element Properties

dialog box. The finished Chart Builder will look like this:

18

The error line chart will look like this:

19

We can conclude that, on average, students earn less than lecturers, but the error bars tells us that the mean is a better representation of the population for students than for lecturers (there is more variability in lecturers income compared to students). An error line chart showing the mean neuroticism for students and lecturers. Access the Chart Builder and select a simple line chart. The y-axis needs to be the thing weve measured, which in this case is neurotic, so select this variable from the variable list and drag it into the y-axis drop zone ( ). The x-axis should again be students vs.

lecturers, so select the variable Group from the variable list and drag it into the drop zone for the x-axis ( ). Add error bars by selecting in the Element

Properties dialog box. The finished Chart Builder will look like this:

The error line chart will look like this:

20

We can conclude that, on average, students are slightly less neurotic than lecturers. A scatterplot with regression lines of alcohol consumption and neuroticism grouped by lecturer/student. Access the Chart Builder and select a grouped scatterplot. It doesnt matter which way around we plot these variables, so lets select alcohol consumption from the variable list and drag it into the y-axis neurotic from the variable list and drag it into the drop zone, and then drag drop zone. We then need

to split the scatterplot by our grouping variable (lecturers or students), so select Group and drag it to the like this: drop zone. The completed Chart Builder dialog box will look

21

Click on

to produce the graph. To fit the regression lines double-click on the graph in in the Chart

the SPSS Viewer to open it in the SPSS Chart Editor. Then click on

Editor to open the properties dialog box. In this dialog box, ask for a linear model to be fitted to the data (this should be set by default). Click on to fit the lines:

22

We can conclude that for lecturers, as neuroticism increases so does alcohol consumption (a positive relationship), but for students the opposite is true, as neuroticism increases alcohol consumption decreases. (Note that SPSS has scaled this graph oddly because neither axis starts at zero; as a bit of extra practice why not edit the two axes so that they start at zero?) A scatterplot matrix with regression lines of alcohol consumption, neuroticism and number of friends. Access the Chart Builder and select a scatterplot matrix. We have to drag all three variables into drop zone. Select the first variable (Friends) by

clicking on it with the mouse. Now, hold down the Ctrl key on the keyboard and click on a second variable (Alcohol). Finally, hold down the Ctrl key and click on a third variable (Neurotic). Once the three variables are selected, click on any one of them and then drag them into . Click on to produce the graph. To fit the regression lines

double-click on the graph in the SPSS Viewer to open it in the SPSS Chart Editor. Then click on in the Chart Editor to open the properties dialog box. In this dialog box, ask to

for a linear model to be fitted to the data (this should be set by default).Click on fit the lines.

We can conclude that there is no relationship (flat line) between the number of friends and alcohol consumption; there was a negative relationship between how neurotic a person was and their number of friends (line slopes downwards); and there was a slight positive relationship between how neurotic a person was and how much alcohol they drank (line slopes upwards).

23

Task 2

24

Using the Infidelity.sav data from Chapter 3 (see Smart Alexs task) plot an clustered error bar chart of the mean number of bullets used against the self and the partner for males and females. To graph these data we need to select a clustered bar chart in the Chart Builder. We have one repeated-measures variable, which is whether the target had the persons face on it, or the face of their partner and is represented in the data file by two columns. In the Chart Builder you need to select these two variables simultaneously by clicking on one and then holding down the Ctrl key on the keyboard and clicking on the other. When they are both highlighted click on either one and drag it into . The second variable (whether

the participant was male or female) was measured using different people (obviously) and so is represented in the data file by a grouping variable (Gender). This variable can be selected in the variable list and dragged into . The two groups will now be in the

displayed as different-coloured bars. Add error bars by selecting Element Properties dialog box. The finished Chart Builder will look like this:

25

The resulting graph looks like this (the labels on both axes could benefit from some editing!):

26

The graph shows that, on average, males and females did not differ much in the number of bullets that they shot at the target when it had their partners face on it. However, men used fewer bullets than women when the target had their own face on it. Chapter 5 Task 1 Using the ChickFlick.sav data, check the assumptions of normality and homogeneity of variance for the two films (ignore gender): are the assumptions met?

The output you should get look like those reproduced below (I used the Explore function described in Chapter 5).

The skewness statistics gives rise to a z-score of 0.378/0.512 = 0.74 for Bridget Jones Diary, and 0.04/0.512 = 0.08 for memento. These show no significant skewness. For kurtosis these values are 0.254/0.992 = 0.26 for Bridget Jones Diary, and 1.024/0.992 = 1.03, so although Memento shows more positive kurtosis, neither are significant.
27

The QQ plots confirm these findings: for both films the expected quantile points are close to those that would be expected from a normal distribution (i.e. the dots fall close to the diagonal line). The KS tests show no significant deviation from normality for both films. We could report that arousal scores for Bridget Jones Diary, D(20) = 0.13, ns, and Memento, D(20) = 0.10, ns, were both not significantly different from a normal distribution. Therefore we can assume normality in the sample data. In terms of homogeneity of variance, Levenes test shows that the variances of arousal for the two films were not significantly different, F(1, 38) = 1.90.

28

29

Task 2 Remember that the numeracy scores were positively skewed in the SPSSExam.sav data (see Figure 5.5)? Transform these data using one of the transformations described in this chapter: do the data become normal? These are the original histogram and those of the transformed scores (Ive included three transformations discussed in the chapter):

30

None of these histograms appear to be normal. Below is the table of results from the KS test, all of which are significant. The only conclusion is that although the square root transformation does the best job of normalizing the data, none of these transformations actually works!

31

Chapter 6 Task 1 A student was interested in whether there was a positive relationship between the time spent doing an essay and the mark received. He got 45 of his friends and timed how ling they spent writing an essay (hours) and the percentage they got in the essay (essay). He also translated these grades into their degree classifications (grade): first, upper second, lower second and third class. Using the data in the file EssayMarks.sav find out what the relationship was between the time spent doing an essay and the eventual mark in terms of percentage and degree class (draw a scatterplot too!).

Were interested in looking at the relationship between hours spent on an essay and the grade obtained. We could simply do a scatterplot of hours spent on the essay (x-axis) and essay mark (y-axis). Ive also chosen to highlight the degree classification grades using different symbols (just place the variable grades in the style box). The resulting scatterplot should look like this:

32

Next, we should check whether the data are parametric using the Explore menu (see Chapter 3). The resulting table is as follows:

Tests of Normality Kolmogorov-Smirnov Statistic df Sig. .111 45 .200* .091 45 .200*


a

Essay Mark (%) Hours Spent on Essay

Statistic .977 .981

Shapiro-Wilk df 45 45

Sig. .493 .662

*. This is a lower bound of the true significance. a. Lilliefors Significance Correction

The KS and ShapiroWilk statistics are both non-significant (Sig > .05 in all cases) for both variables, which indicates that they are normally distributed. As such we can use Pearsons correlation coefficient. The result of which is:

33

Correlations Essay Mark (%) 1 . 45 .267* .038 45 Hours Spent on Essay .267* .038 45 1 . 45

Essay Mark (%)

Hours Spent on Essay

Pearson Correlation Sig. (1-tailed) N Pearson Correlation Sig. (1-tailed) N

*. Correlation is significant at the 0.05 level (1-tailed).

I chose a one-tailed test because a specific prediction was made: there would be a positive relationship; that is, the more time you spend on your essay, the better mark youll get. This hypothesis is supported because Pearsons r = .27 (a medium effect size), p < .05, is significant. The second part of the question asks us to do the same analysis but when the percentages are recoded into degree classifications. The degree classifications are ordinal data (not interval): they are ordered categories, so we shouldnt use Pearsons test statistic, but Spearmans and Kendalls ones instead:
Correlations Hours Spent on Essay 1.000 . 45 -.158 .089 45 1.000 . 45 -.193 .102 45 Grade -.158 .089 45 1.000 . 45 -.193 .102 45 1.000 . 45

Kendall's tau_b

Hours Spent on Essay

Grade

Spearman's rho

Hours Spent on Essay

Grade

Correlation Coefficient Sig. (1-tailed) N Correlation Coefficient Sig. (1-tailed) N Correlation Coefficient Sig. (1-tailed) N Correlation Coefficient Sig. (1-tailed) N

34

In both cases the correlation is non-significant. There was no significant relationship between degree grade classification for an essay and the time spent doing it, = .19, ns, and = .16, ns. Note that the direction of the relationship has reversed. This has happened because the essay marks were recoded as 1 (first), 2 (upper second), 3 (lower second) and 4 (third), so high grades were represented by low numbers! This illustrates one of the benefits of not taking continuous data (like percentages) and transforming them into categorical data: when you do, you lose information and often statistical power! Task 2 Using the ChickFlick.sav data from Chapter 3, is there a relationship between gender and arousal? Using the same data, is there a relationship between the film watched and arousal?

Now, both gender and the film watched are categorical variables with two categories. Therefore, we need to look at this relationship using a pointbiserial correlation. The resulting tables are as follows:

Correlations Gender Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Gender 1 . 40 -.180 .266 40 Arousal -.180 .266 40 1 . 40

Arousal

35

Correlations Film Film Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 1 . 40 .638** .000 40 Arousal .638** .000 40 1 . 40

Arousal

**. Correlation is significant at the 0.01 level (2 il d)

In both cases I used a two-tailed test because no prediction was made. As you can see, there was no significant relationship between gender and arousal, rpb = .18, ns. However, there was a significant relationship between the film watched and arousal, rpb = .64, p < .001. Looking at how the groups were coded, you should see that Bridget Jones Diary had a code of 1, and Memento had a code of 2, therefore this result reflects the fact that as film goes up (changes from 1 to 2) arousal goes up. Put another way, as the film changes from Bridget Jones Diary to Momento, arousal increases. So, Momento gave rise to the greater arousal levels. Task 3 As a statistics lecturer I am always interested in the factors that determine whether a student will do well on a statistics course. One potentially important factor is their previous expertise with mathematics. Imagine I took 25 students and looked at their degree grades for my statistics course at the end of their first year at university. In the UK, a student can get a first-class mark (the best), an upper second, a lower second, a third, a pass or a fail (the worst). I also asked these students what grade they got in their GCSE maths exams. In the UK GCSEs are school exams taken at age 16 that are graded A, B, C, D, E or F (an A grade is

36

better than all of the lower grades). The data for this study are in the file grades.sav. Carry out the appropriate analysis to see if GCSE maths grades correlate with first-year statistics grades.

Lets look at these variables. In the UK, a student can get a first-class mark, an upper second, a lower second, a third, a pass or a fail. These grades are categories, but they have an order to them (an upper second is better than a lower second). In the UK GCSEs are school exams taken at age 16 that are graded A, B, C, D, E or F. Again, these grades are categories that have an order of importance (an A grade is better than all of the lower grades). When you have categories like these that can be ordered in a meaningful way, the data are said to be ordinal. The data are not interval, because a first-class degree encompasses a 30% range (70100%) whereas an upper second only covers a 10% range (6070%). When data have been measured at only the ordinal level they are said to be non-parametric and Pearsons correlation is not appropriate. Therefore, the Spearman correlation coefficient is used. The data are in two columns: one labelled stats and one labelled gcse. Each of the categories described above has been coded with a numeric value. In both cases, the highest grade (first class or A grade) has been coded with the value 1, with subsequent categories being labelled 2, 3 and so on. Note that for each numeric code I have provided a value label (just like we did for coding variables). The procedure for doing the Spearman correlation is the same as for Pearsons correlation except that in the bivariate correlations dialog box we need to select and deselect

the option for a Pearson correlation. At this stage, you should also specify whether you
37

require a one- or two-tailed test. For the example above, I predicted that better grades in GCSE maths would correlate with better degree grades for my statistics course. This hypothesis is directional and so a one-tailed test should be selected ( ).

The SPSS output shows the Spearman correlation on the variables stats and gcse. The output shows a matrix giving the correlation coefficient between the two variables (.455), underneath is the significance value of this coefficient (.011) and finally the sample size (25). The significance value for this correlation coefficient is less than .05; therefore, it can be concluded that there is a significant relationship between a students grade in GCSE maths and their degree grade for their statistics course. The correlation itself is positive: therefore, we can conclude that as GCSE grades improve, there is a corresponding improvement in degree grades for statistics. As such, the hypothesis was supported. Finally, it is good to check that the value of N corresponds to the number of observations that were made. If it doesnt then data may have been excluded for some reason.
Correlations Statistics Grade Spearman's rho Statistics Grade Correlation Coefficient Sig. (1-tailed) N Correlation Coefficient Sig. (1-tailed) N 1.000 . 25 .455* .011 25 GCSE Maths Grade .455* .011 25 1.000 . 25

GCSE Maths Grade

*. Correlation is significant at the .05 level (1-tailed).

We could also look at Kendalls correlation by selecting the same as for Spearmans correlation.

. The output is much

The actual value of the correlation coefficient is less than Spearmans correlation (it has decreased from .455 to .354). Despite the difference in the correlation coefficients we can
38

still interpret this result as being a highly significant positive relationship (because the significance value of .015 is less than .05). However, Kendalls value is a more accurate gauge of what the correlation in the population would be. As with Pearsons correlation we cannot assume that the GCSE grades caused the degree students to do better in their statistics course.
Correlations Statistics Grade Kendall's tau_b Statistics Grade Correlation Coefficient Sig. (1-tailed) N Correlation Coefficient Sig. (1-tailed) N 1.000 . 25 .354* .015 25 GCSE Maths Grade .354* .015 25 1.000 . 25

GCSE Maths Grade

*. Correlation is significant at the .05 level (1-tailed).

We could report these results as follows: 1. There was a positive relationship between a persons statistics grade and their GCSE maths grade, rs = .46, p < .05. 2. There was a positive relationship between a persons statistics grade and their GCSE maths grade, = .35, p < .05. (Note that Ive quoted Kendalls tau here.) Chapter 7 Task 1 A fashion student was interested in factors that predicted the salaries of catwalk models. She collected data from 231 models. For each model she asked them their salary per day on days when they were working (salary), their age (age), how

39

many years they had worked as a model (years), and then got a panel of experts from modelling agencies to rate the attractiveness of each model as a percentage with 100% being perfectly attractive (beauty). The data are in the file Supermodel.sav. Unfortunately, this fashion student bought some substandard statistics text and so doesnt know how to analyse her data. Can you help her out by conducting a multiple regression to see which factor predict a models salary? How valid is the regression model?

Model Summaryb Change Statistics Model 1 R .429a R Square .184 Adjusted R Square .173 Std. Error of the Estimate 14.57213 R Square Change .184 F Change 17.066 df1 3 df2 227 Sig. F Change .000 Durbin-W atson 2.057

a. Predictors: (Constant), Attractiveness (%), Number of Years as a Model, Age (Years) b. Dependent Variable: Salary per Day ()

ANOVAb Model 1 Sum of Squares 10871.964 48202.790 59074.754 df 3 227 230 Mean Square 3623.988 212.347 F 17.066 Sig. .000a

Regression Residual Total

a. Predictors: (Constant), Attractiveness (%), Number of Years as a Model, Age (Years) b. Dependent Variable: Salary per Day ()

To begin with, a sample size of 231 with three predictors seems reasonable because this would easily detect medium to large effects (see the diagram in the chapter). Overall, the model accounts for 18.4% of the variance in salaries and is a significant fit of the data (F(3, 227) = 17.07, p < .001). The adjusted R2 (.17) shows some shrinkage from

40

the unadjusted value (.184) indicating that the model may not generalize well. We can also use Steins formula:

231 1 231 2 231 + 1 adjusted R 2 = 1 (1 0.184) 231 3 1 231 3 2 231 = 1 [1.031](0.816) = 1 0.841 = 0.159 This also shows that the model may not cross-generalize well.
Coefficientsa Unstandardized Coefficients B Std. Error -60.890 16.497 6.234 1.411 -5.561 -.196 2.122 .152 Standardized Coefficients Beta .942 -.548 -.083 95% Confidence Interval for B Lower Bound Upper Bound -93.396 -28.384 3.454 9.015 -9.743 -.497 -1.380 .104 Collinearity Statistics Tolerance VIF .079 .082 .867 12.653 12.157 1.153

Model 1

(Constant) Age (Years) Number of Years as a Model Attractiveness (%)

t -3.691 4.418 -2.621 -1.289

Sig. .000 .000 .009 .199

a. Dependent Variable: Salary per Day ()

In terms of the individual predictors we could report: B SE B

Constant Age Years as a model Attractiveness

60.89 6.23 5.56 0.20

16.50 1.41 2.12 0.15 .94** .55* .08

Note: R2 = .18 (p < .001). * p < .01, ** p < .001.

41

It seems as though salaries are significantly predicted by the age of the model. This is a positive relationship (look at the sign of the beta), indicating that as age increases, salaries increase too. The number of years spent as a model also seems to significantly predict salaries, but this is a negative relationship indicating that the more years youve spent as a model, the lower your salary. This finding seems very counter-intuitive, but well come back to it later. Finally, the attractiveness of the model doesnt seem to predict salaries. If we wanted to write the regression model, we could write it as: Salary = 0 + 1Agei + 2 Experiencei + 2 Attractivenessi = 60.89 + (6.23Age i ) (5.56Experiencei )(0.02Attractivenessi ) The next part of the question asks whether this model is valid.

a Casewise Diagnostics

a Collinearity Diagnostics

Model 1

Dimension 1 2 3 4

Eigenvalue 3.925 .070 .004 .001

Condition Index 1.000 7.479 30.758 63.344

(Constant) .00 .01 .30 .69

Variance Proportions Number of Years as a Model Age (Years) .00 .00 .00 .08 .02 .01 .98 .91

Attractiveness (%) .00 .02 .94 .04

a. Dependent Variable: Salary per Day ()

Case Number 2 5 24 41 91 116 127 135 155 170 191 198

Std. Residual 2.186 4.603 2.232 2.411 2.062 3.422 2.753 4.672 3.257 2.170 3.153 3.510

Salary per Day () 53.72 95.34 48.87 51.03 56.83 64.79 61.32 89.98 74.86 54.57 50.66 71.32

Predicted Value 21.8716 28.2647 16.3444 15.8861 26.7856 14.9259 21.2059 21.8946 27.4025 22.9401 4.7164 20.1729

Residual 31.8532 67.0734 32.5232 35.1390 30.0459 49.8654 40.1129 68.0854 47.4582 31.6254 45.9394 51.1478

a. Dependent Variable: Salary per Day ()

42

Histogram Dependent Variable: Salary per Day ()


60 50 .75 40 30 20 1.00

Normal P-P Plot of Regression Standardiz Dependent Variable: Salary per Day ()

Expected Cum Prob

.50

Frequency

10 0

Std. Dev = .99 Mean = 0.00 N = 231.00

.25

0.00 0.00 .25 .50 .75 1.00

Regression Standardized Residual

Scatterplot Dependent Variable: Salary per Day ()


5 4

5 -.7 5 .2 -1 5 .7 -1

5 -.2

5 .2

5 .7

75 4. 25 4. 75 3. 25 3. 75 2. 25 2. 75 1. 25 1.

Observed Cum Prob

Partial Regression Plot Dependent Variable: Salary per Day ()


80

Regression Standardized Residual

60
3 2 1 0 -1 -2 -3 -2 -1 0 1 2 3

40

20

Salary per Day ()

-20 -40 -3 -2 -1 0 1 2

Regression Standardized Predicted Value

Age (Years)

Partial Regression Plot Dependent Variable: Salary per Day ()


80 80

Partial Regression Plot Dependent Variable: Salary per Day ()

60

60

40

40

20

20

Salary per Day ()

Salary per Day ()


-1.0 -.5 0.0 .5 1.0 1.5

-20 -40 -1.5

-20 -40 -20 -10 0 10 20 30

Number of Years as a Model

Attractiveness (%)

Residuals: There six cases that have a standardized residual greater than 3, and two of these are fairly substantial (case 5 and 135). We have 5.19% of cases with

43

standardized residuals above 2, so thats as we expect, but 3% of cases with residuals above 2.5 (wed expect only 1%), which indicates possible outliers. Normality of errors: The histogram reveals a skewed distribution indicating that the normality of errors assumption has been broken. The normal PP plot verifies this because the dashed line deviates considerably from the straight line (which indicates what youd get from normally distributed errors). Homoscedasticity and independence of errors: The scatterplot of ZPRED vs. ZRESID does not show a random pattern. There is a distinct funnelling indicating heteroscedasticity. However, the DurbinWatson statistic does fall within Fields recommended boundaries of 13, which suggests that errors are reasonably independent. Multicollinearity: For the age and experience variables in the model, VIF values are above 10 (or alternatively, tolerance values are all well below 0.2) indicating multicollinearity in the data. In fact, if you look at the correlation between these two variables it is around .9! So, these two variables are measuring very similar things. Of course, this makes perfect sense because the older a model is, the more years she wouldve spent modelling! So, it was fairly stupid to measure both of these things! This also explains the weird result that the number of years spent modelling negatively predicted salary (i.e. more experience = less salary!): in fact if you do a simple regression with experience as the only predictor of salary youll find it has the expected positive relationship. This hopefully demonstrates why multicollinearity can bias the regression model.

44

All in all, several assumptions have not been met and so this model is probably fairly unreliable.
Task 2

Using the Glastonbury data from this chapter (with the dummy coding in
GlastonburyDummy.sav), which you shouldve already analysed, comment on

whether you think the model is reliable and generalizable.

This question asks whether this model is valid.


b Model Summary

Change Statistics Model 1 R R Square .276a .076 Adjusted R Square .053 Std. Error of the Estimate .68818 R Square Change .076 F Change 3.270 df1 3 df2 119 Sig. F Change .024 DurbinWatson 1.893

a. Predictors: (Constant), No Affiliation vs. Indie Kid, No Affiliation vs. Crusty, No Affiliation vs. Metaller b. Dependent Variable: Change in Hygiene Over The Festival

Coefficientsa Unstandardized Coefficients B Std. Error -.554 .090 -.412 .167 .028 .160 -.410 .205 Standardized Coefficients Beta -.232 .017 -.185 95% Confidence Interval for B Lower Bound Upper Bound -.733 -.375 -.742 -.081 -.289 .346 -.816 -.004 Collinearity Statistics Tolerance VIF .879 .874 .909 1.138 1.144 1.100

Model 1

(Constant) No Affiliation vs. Crusty No Affiliation vs. Metaller No Affiliation vs. Indie Kid

t -6.134 -2.464 .177 -2.001

Sig. .000 .015 .860 .048

a. Dependent Variable: Change in Hygiene Over The Festival

a Casewise Diagnostics
a Collinearity Diagnostics

Model 1

Dimension 1 2 3 4

Eigenvalue 1.727 1.000 1.000 .273

Condition Index 1.000 1.314 1.314 2.515

(Constant) .14 .00 .00 .86

Variance Proportions No Affiliation No Affiliation vs. Crusty vs. Metaller .08 .08 .37 .32 .07 .08 .48 .52

No Affiliation vs. Indie Kid .05 .00 .63 .32

a. Dependent Variable: Change in Hygiene Over The Festival

Case Number 31 153 202 346 479

Std. Residual -2.302 2.317 -2.653 -2.479 2.215

Change in Hygiene Over The Festival -2.55 1.04 -2.38 -2.26 .97

Predicted Value -.9658 -.5543 -.5543 -.5543 -.5543

Residual -1.5842 1.5943 -1.8257 -1.7057 1.5243

a. Dependent Variable: Change in Hygiene Over The Festival

45

Histogram Dependent Variable: Change in Hygiene Over The


20 1.00

Normal P-P Plot of Regression Standard Dependent Variable: Change in Hygiene

.75

10

Expected Cum Prob

.50

Frequency

Std. Dev = .99 Mean = 0.00 0


-2 .7 5 -2 .2 5 -1 .7 5 -1 .2 5 - .7 5 - .2 5 .2 5 .7 5 1. 25 1. 75

.25

N = 123.00
2. 25

0.00 0.00 .25 .50 .75 1.00

Regression Standardized Residual

Observed Cum Prob

Scatterplot Dependent Variable: Change in Hygiene Over The


3 2

Partial Regression Plot Dependent Variable: Change in Hygiene Over The


Change in Hygiene Over The Festival

Regression Standardized Residual

-1

-1

-2 -3 -2.0 -1.5 -1.0 -.5 0.0 .5 1.0

-2 -.4 -.2 0.0 .2 .4 .6 .8

Regression Standardized Predicted Value

No Affiliation vs. Crusty

Partial Regression Plot Dependent Variable: Change in Hygiene Over Th


2.0 2.0

Partial Regression Plot Dependent Variable: Change in Hygiene Over Th


Change in Hygiene Over The Festival
1.5 1.0 .5 0.0 -.5 -1.0 -1.5 -2.0 -.4 -.2 0.0 .2 .4 .6 .8 1.0

Change in Hygiene Over The Festival

1.5 1.0 .5 0.0 -.5 -1.0 -1.5 -2.0 -.4 -.2 0.0 .2 .4 .6 .8

No Affiliation vs. Metaller

No Affiliation vs. Indie Kid

46

Residuals: There are no cases that have a standardized residual greater than 3. We have 4.07% of cases with standardized residuals above 2, so thats as we expect, and .81% of cases with residuals above 2.5 (and wed expect 1%), which indicates the data are consistent with what wed expect. Normality of errors: The histogram looks reasonably normally distributed indicating that the normality of errors assumption has probably been met. The normal PP plot verifies this because the dashed line doesnt deviate much from the straight line (which indicates what youd get from normally distributed errors). Homoscedasticity and independence of errors: The scatterplot of ZPRED vs. ZRESID does look a bit odd with categorical predictors, but essentially were looking for the height of the lines to be about the same (indicating the variability at each of the three levels is the same). This is true indicating homoscedasticity. The DurbinWatson statistic also falls within Fields recommended boundaries of 13, which suggests that errors are reasonably independent. Multicollinearity: For all variables in the model, VIF values are below 10 (or alternatively, tolerance values are all well above 0.2) indicating no multicollinearity in the data. All in all, the model looks fairly reliable (but you should check for influential cases!).
Task 3
47

A study was carried out to explore the relationship between aggression and several potential predicting factors in 666 children who had an older sibling. Variables measured were Parenting_Style (high score = bad parenting practices),
Computer_Games (high score = more time spent playing computer games), Television (high score = more time spent watching television), Diet (high score =

the child has a good diet low in E-numbers), and Sibling_Aggression (high score = more aggression seen in their older sibling). Past research indicated that parenting style and sibling aggression were good predictors of the level of aggression in the younger child. All other variables were treated in an exploratory fashion. The data are in the file Child Aggression.sav. Analyse them with multiple regression.

We need to conduct this analysis hierarchically entering parenting style and sibling aggression in the first step (forced entry) and the remaining variables in a second step (stepwise):

48

49

50

51

Based on the final model (which is actually all were interested in) the following variables predict aggression:

Parenting style (b = 0.062, = 0.194, t = 4.93, p < .001) significantly predicted aggression. The beta value indicates that as parenting increases (i.e. as bad practices increase), aggression increases also.

52

Sibling aggression (b = 0.086, = 0.088, t = 2.26, p < .05) significantly predicted aggression. The beta value indicates that as sibling aggression increases (became more aggressive), aggression increases also.

Computer games (b = 0.143, = 0.037, t = 3.89, p < .001) significantly predicted aggression. The beta value indicates that as the time spent playing computer games increases, aggression increases also.

E-numbers (b = -.112, =-0.118, t = -2.95, p < .01) significantly predicted aggression. The beta value indicates that as the diet improved, aggression decreased.

The only factor not to predict aggression was: Television (b if entered = .032, t = 0.72, p > .05) did not significantly predict aggression. Based on the standardized beta values, the most substantive predictor of aggression was actually parenting style, followed by computer games, diet and then sibling aggression. R2 is the squared correlation between the observed values of aggression and the values of aggression predicted by the model. The values in this output tell us that sibling aggression and parenting style in combination explain 5.3% of the variance in aggression. When computer game use is factored in as well, 7% of variance in aggression is explained (i.e. an additional 1.7%). Finally, when diet is added to the model, 8.2% of the variance in aggression is explained (an additional 1.2%). With all four of these predictors in the model still less than of the variance in aggression can be explained.

53

The DurbinWatson statistic tests the assumption of independence of errors, which means that for any two observations (cases) in the regression, their residuals should be uncorrelated (or independent). In this output the DurbinWatson statistic falls within the recommended boundaries of 13, which suggests that errors are reasonably independent. The scatterplot helps us to assess both homoscedasticity and independence of errors. The scatterplot of ZPRED vs. ZRESID does show a random pattern and so indicates no violation of the independence of errors assumption. Also, the errors on the scatterplot do not funnel out, indicating homoscedascitity of errors, thus no violations of these assumptions.

Chapter 8 Task 1

A psychologist was interested in whether childrens understanding of display rules can be predicted from their age, and whether the child possesses a theory of mind. A display rule is a convention of displaying an appropriate emotion in a given situation. For example, if you receive a Christmas present that you dont like, the appropriate emotional display is to smile politely and say Thank you Auntie Kate, Ive always wanted a rotting cabbage. The inappropriate emotional display is to start crying and scream Why did you buy me a rotting cabbage you selfish old bag? Using appropriate display rules has been linked to having a theory of mind (the ability to understand what another person might be thinking).
54
Why did you buy me this crappy statistics Why did you buytextbook me this for Christmas crappy statistics Auntie textbook Kate? Auntie for Christmas Kate?

To test this theory, children were given a false belief task (a task used to measure whether someone has a theory of mind), a display rule task (which they could either pass or fail) and their age in months was measured. The data are in
Display.sav. Run a logistic regression to see whether possession of display rule

understanding (did the child pass the test: Yes/No?) can be predicted from possession of a theory of mind (did the child pass the false belief task: Yes/No?), age in months and their interaction. For this example, our researchers are interested in whether the understanding of emotional display rules was linked to having a theory of mind. The rationale is that it might be necessary for a child to understand how another person thinks to realize how their emotional displays will affect that person: if you cant put yourself in Auntie Kates mind, then you wont realize that she might be upset by you calling her an old bag. To test this theory, several children were given a standard false belief task (a task used to measure whether someone has a theory of mind) that they could either pass or fail and their age in months was also measured. In addition, each child was given a display rule task, which they could either pass or fail. So, the following variables were measured: 1.
Outcome (dependent variable): Possession of display rule understanding

(Did the child pass the test: Yes/No?). 2.


Predictor (independent variable): Possession of a theory of mind (Did

the child pass the false belief task: Yes/No?). 3.


Predictor (independent variable): Age in months.

The Main Analysis

55

To carry out logistic regression, the data must be entered as for normal regression: they are arranged in the data editor in three columns (one representing each variable). The data can be found in the file display.sav. Looking at the data editor you should notice that both of the categorical variables have been entered as coding variables; that is, numbers have been specified to represent categories. For ease of interpretation, the outcome variable should be coded 1 (event occurred) and 0 (event did not occur); in this case, 1 represents having display rule understanding, and 0 represents an absence of display rule understanding. For the false belief task a similar coding has been used (1 = passed the false belief task, 2 = failed the false belief task). Logistic regression is located in the regression menu accessed by selecting .

Following this menu path activates the main Logistic Regression dialog box shown below.

The main dialog box is very similar to the standard regression option box. There is a space to place a dependent variable (or outcome variable). In this example, the outcome was the display rule task, so we can simply click on display and transfer it to the

56

Dependent box by clicking on

. There is also a box for specifying the covariates (the

predictor variables). It is possible to specify both main effects and interactions in logistic regression. To specify a main effect, simply select one predictor (e.g. age) and then transfer this variable to the Covariates box by clicking on . To input an interaction,

click on more than one variable on the left-hand side of the dialog box (i.e. highlight two or more variables) and then click on to move them to the Covariates box.

For this analysis select a Forward:LR method of regression. In this example there is one categorical predictor variable. One of the great things about logistic regression is that it is quite happy to accept categorical predictors. However, it is necessary to tell SPSS which variables, if any, are categorical by clicking on the main Logistic Regression dialog box to activate this dialog box: in

The covariates are listed on the left-hand side, and there is a space on the right-hand side in which categorical covariates can be placed. Simply highlight any categorical variables you have (in this example click on fb) and transfer them to the Categorical Covariates box by clicking on . There are many ways in which you can treat categorical predictors.

Categorical predictors could be incorporated into regression by recoding them using

57

zeros and ones (known as dummy coding). Now, actually, there are different ways you can arrange this coding depending on what you want to compare, and SPSS has several standard ways built into it that you can select. By default SPSS uses indicator coding, which is the standard dummy variable coding that I explained in Chapter 7 (and you can choose to have either the first or last category as your baseline). To change to a different kind of contrast click on the down arrow in the Change Contrast box. Select Indicator coding (first). Obtaining Residuals To save residuals click on in the main Logistic Regression dialog box. SPSS

saves each of the selected variables into the data editor. The residuals dialog box gives us several options and most of these are the same as those in multiple regression. Select all of the available options, or as a bare minimum select the same options as:

Further Options There is a final dialog box that offers further options. This box is accessed by clicking on in the main Logistic Regression dialog box. For the most part, the default settings

58

in this dialog box are fine. These options are explained in the chapter and so just select the following:

Interpreting The Output

Dependent Variable Encoding Original Value No Yes Internal Value 0 1

Categorical Variables Codings Paramete False Belief understanding No Yes Frequency 29 41 (1) .000 1.000

These tables tell us the parameter codings given to the categorical predictor variable. Indicator coding was chosen with two categories, and so the coding is the same as the values in the data editor.
59

a,b Classification Table

Step 0

Observed Display Rule understanding Overall Percentage

No Yes

Predicted Display Rule understanding No Yes 0 31 0 39

Percentage Correct .0 100.0 55.7

a. Constant is included in the model. b. The cut value is .500

Variables in the Equation

Step 0

Constant

B .230

S.E. .241

Wald .910

df 1

Sig. .340

Exp(B) 1.258

Variables not in the Equation Step 0 Variables AGE FB(1) AGE by FB(1) Score 15.956 24.617 23.987 26.257 df 1 1 1 3 Sig. .000 .000 .000 .000

Overall Statistics

For this first analysis we requested a forward stepwise method and so the initial model is derived using only the constant in the regression equation. The above output tells us about the model when only the constant is included (i.e. all predictor variables are omitted). Although SPSS doesnt display this value, the log-likelihood of this baseline model is 96.124 (trust me for the time being!). This represents the fit of the model when the most basic model is fitted to the data. When including only the constant, the computer bases the model on assigning every participant to a single category of the outcome variable. In this example, SPSS can decide either to predict that every child has display rule understanding, or to predict that all children do not have display rule understanding. It could make this decision arbitrarily, but because it is crucial to try to maximize how well the model predicts the observed data SPSS will predict that every child belongs to the category in which most observed cases fell. In this example there were 39 children
60

who had display rule understanding and only 31 who did not. Therefore, if SPSS predicts that every child has display rule understanding then this prediction will be correct 39 times out of 70 (i.e. 56% approx.). However, if SPSS predicted that every child did not have display rule understanding, then this prediction would be correct only 31 times out of 70 (44% approx.). As such, of the two available options it is better to predict that all children had display rule understanding because this results in a greater number of correct predictions. The output shows a contingency table for the model in this basic state. You can see that SPSS has predicted that all children have display rule understanding, which results in 0% accuracy for the children who were observed to have no display rule understanding, and 100% accuracy for those children observed to have passed the display rule task. Overall, the model correctly classifies 55.71% of children. The next part of the output summarizes the model, and at this stage this entails quoting the value of the constant (b0), which is equal to 0.23. The final table of the output is labelled Variables not in the Equation. The bottom line of this table reports the residual chi-square statistic as 26.257 which is significant at p < .0001 (it labels this statistic Overall Statistics). This statistic tells us that the coefficients for the variables not in the model are significantly different from zeroin other words, that the addition of one or more of these variables to the model will significantly affect its predictive power. If the probability for the residual chi-square had been greater than .05 it would have meant that none of the variables excluded from the model could make a significant contribution to the predictive power of the model. As such, the analysis would have terminated at this stage.

61

The remainder of this table lists each of the predictors in turn with a value of Roas efficient score statistic for each one (column labelled Score). In large samples when the null hypothesis is true, the score statistic is identical to the Wald statistic and the likelihood ratio statistic. It is used at this stage of the analysis because it is computationally less intensive than the Wald statistic and so can still be calculated in situations when the Wald statistic would prove prohibitive. Like any test statistic Roas score statistic has a specific distribution from which statistical significance can be obtained. In this example, all excluded variables have significant score statistics at p < .001 and so all three could potentially make a contribution to the model. The stepwise calculations are relative and so the variable that will be selected for inclusion is the one with the highest value for the score statistic that is significant at a .05 level of significance. In this example, that variable will be fb because it has the highest value of the score statistic. The next part of the output deals with the model after this predictor has been added.

In the first step, false belief understanding (fb) is added to the model as a predictor. As such a child is now classified as having display rule understanding based on whether they passed or failed the false belief task.
Omnibus Tests of Model Coefficients Chi-square 26.083 26.083 26.083 df 1 1 1 Sig. .000 .000 .000

Step 1

Step Block Model

Model Summary -2 Log likelihood 70.042 Cox & Snell R Square .311 Nagelkerke R Square .417

Step 1

62

a Classification Table

Step 1

Observed Display Rule understanding Overall Percentage

No Yes

Predicted Display Rule understanding No Yes 23 8 6 33

Percentage Correct 74.2 84.6 80.0

a. The cut value is .500

The above shows summary statistics about the new model (which weve already seen contains fb). The overall fit of the new model is assessed using the log-likelihood statistic. In SPSS, rather than reporting the log-likelihood itself, the value is multiplied by 2 (and sometimes referred to as 2LL): this multiplication is done because 2LL has an approximately chi-square distribution and so makes it possible to compare values against those that we might expect to get by chance alone. Remember that large values of the log-likelihood statistic indicate poorly fitting statistical models. At this stage of the analysis the value of 2 log-likelihood should be less than the value when only the constant was included in the model (because lower values of 2LL indicate that the model is predicting the outcome variable more accurately). When only the constant was included, -2LL = 96.124, but now fb has been included this value has been reduced to 70.042. This reduction tells us that the model is better at predicting display rule understanding than it was before fb was added. The question of how much better the model predicts the outcome variable can be assessed using the model chisquare statistic, which measures the difference between the model as it currently stands and the model when only the constant was included. We can assess the significance of the change in a model by taking the log-likelihood of the new model and subtracting the loglikelihood of the baseline model from it. The value of the model chi-square statistic works on this principle and is, therefore, equal to 2LL with fb included minus the value

63

of 2LL when only the constant was in the model (96.124 70.042 = 26.083). This value has a chi-square distribution and so its statistical significance can be easily calculated. In this example, the value is significant at a .05 level and so we can say that overall the model is predicting display rule understanding significantly better than it was with only the constant included. The model chi-square is an analogue of the F-test for the linear regression sum of squares. In an ideal world we would like to see a non-significant 2LL (indicating that the amount of unexplained data is minimal) and a highly significant model chi-square statistic (indicating that the model including the predictors is significantly better than without those predictors). However, in reality it is possible for both statistics to be highly significant. There is a second statistic called the step statistic that indicates the improvement in the predictive power of the model since the last stage. At this stage there has been only one step in the analysis and so the value of the improvement statistic is the same as the model chi-square. However, in more complex models in which there are three or four stages, this statistic gives you a measure of the improvement of the predictive power of the model since the last step. Its value is equal to 2LL at the current step minus 2LL at the previous step. If the improvement statistic is significant then it indicates that the model now predicts the outcome significantly better than it did at the last step, and in a forward regression this can be taken as an indication of the contribution of a predictor to the predictive power of the model. Similarly, the block statistic provides the change in 2LL since the last block (for use in hierarchical or blockwise analyses). Finally, the classification table at the end of this section of the output indicates how well the model predicts group membership. The current model correctly classifies 23 children

64

who dont have display rule understanding but misclassifies 8 others (i.e. it correctly classifies 74.19% of cases). For children who do have display rule understanding, the model correctly classifies 33 and misclassifies 6 cases (i.e. correctly classifies 84.62% of cases). The overall accuracy of classification is, therefore, the weighted average of these two values (80%). So, when only the constant was included, the model correctly classified 56% of children, but now, with the inclusion of fb as a predictor, this has risen to 80%.
Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 4.835 51.706

Step a 1

FB(1) Constant

B 2.761 -1.344

S.E. .605 .458

Wald 20.856 8.592

df 1 1

Sig. .000 .003

Exp(B) 15.812 .261

a. Variable(s) entered on step 1: FB.

The next part of the output is crucial because it tells us the estimates for the coefficients for the predictors included in the model. This section of the output gives us the coefficients and statistics for the variables that have been included in the model at this point (namely, fb and the constant). The interpretation of this coefficient in logistic regression is that it represents the change in the logit of the outcome variable associated with a one-unit change in the predictor variable. The logit of the outcome is simply the natural logarithm of the odds of Y occurring. The crucial statistic is the Wald statistic, which has a chi-square distribution and tells us whether the b coefficient for that predictor is significantly different from zero. If the coefficient is significantly different from zero then we can assume that the predictor is making a significant contribution to the prediction of the outcome (Y). For these data it seems to indicate that false belief understanding is a significant predictor of display rule understanding (note the significance of the Wald statistic is less than .05).
65

We can calculate an analogue of R using the equation in the chapter (for these data, the Wald statistic and its df are 20.856 and 1 respectively), and the original 2LL was 96.12. Therefore, R can be calculated as: 20.856 (2 1) R= 96.124 = .4429 Hosmer and Lemeshows measure (R2L) is calculated by dividing the model chi-square by the original 2LL. In this example the model chi-square after all variables have been entered into the model is 26.083, and the original 2LL (before any variables were entered) was 96.124. So, R2L = 26.083/96.124 = .271, which is different to the value we would get by squaring the value of R given above (R2 = .44292 =0.196). SPSS reports Cox and Snells measure, which SPSS reports as .311. This is calculated as This is calculated from the equation in the book chapter. Remember that this equation uses the log-likelihood, whereas SPSS reports 2 log-likelihood. LL(New) is, therefore, 70.042/2 = 35.021, and LL(Baseline) = 96.124/2 = 48.062. The sample size, n, is 70:
2 70 ( 35.021 ( 48.062))

R = 1 e
2 CS

= 1 e 0.3726 = 1 0.6889 = 0.311


Nagelkerkes adjusted value is .417. This is calculated as:

66

2 = RN

0.311
2( 48.062) 70

1 e 0.311 = 1 e 1.3732 0.311 = 1 0.2533 = 0.416

As you can see, theres a fairly substantial difference between the two values! The final thing we need to look at is exp b (Exp(B) in the SPSS output), which was described in the book chapter. To calculate the change in odds that results from a unit change in the predictor for this example, we must first calculate the odds of a child having display rule understanding given that they don't have second-order false belief task understanding. We then calculate the odds of a child having display rule understanding given that they do have false belief understanding. Finally, we calculate the proportionate change in these two odds. To calculate the first set of odds, we need to calculate the probability of a child having display rule understanding given that they failed the false belief task. The parameter coding at the beginning of the output told us that children who failed the false belief task were coded with a 0, so we can use this value in place of X. The value of b1 has been estimated for us as 2.7607 (see Variables in the Equation), and the coefficient for the constant can be taken from the same table and is 1.3437. We can calculate the odds as:

67

P(event Y ) = =

1+ e

(b0 + b1 X1 )

1+ e = 0.2069

[1.3437 +( 2.76070 )]

P( no event Y ) = 1 P(event Y ) = 1 0.2069 = 0.7931 odds = 0.2069 0.7931 = 0.2609

Now, we calculate the same thing after the predictor variable has changed by one unit. In this case, because the predictor variable is dichotomous, we need to calculate the odds of a child passing the display rule task, given that they have passed the false belief task. So, the value of the false belief variable, X, is now 1 (rather than 0). The resulting calculations are:
P( event Y ) = 1 1 + e (b0 + b1 X 1 ) 1 = [ 1.3437 + ( 2.7607 1 )] 1+ e = 0.8049

P( no event Y ) = 1 P( event Y ) = 1 0.8049 = 0.1951 odds = 0.8049 0.1951 = 4.1256

68

We now know the odds before and after a unit change in the predictor variable. It is now a simple matter to calculate the proportionate change in odds by dividing the odds after a unit change in the predictor by the odds before that change.

odds =

odds after a unit change in the predictor original odds 4.1256 = 0.2609 = 15.8129

You should notice that the value of the proportionate change in odds is the same as the value that SPSS reports for exp b (allowing for differences in rounding). We can interpret exp b in terms of the change in odds. If the value is greater than 1 then it indicates that as the predictor increases, the odds of the outcome occurring increase. Conversely, a value less than 1 indicates that as the predictor increases, the odds of the outcome occurring decrease. In this example, we can say that the odds of a child who has false belief understanding also having display rule understanding are 15 times higher than those of a child who does not have false belief understanding. In the options (see section 0), we requested a confidence interval for exp b and it can also be found in the output. The way to interpret this confidence interval is to say that if we ran 100 experiments and calculated confidence intervals for the value of exp b, then these intervals would encompass the actual value of exp b in the population (rather than the sample) on 95 occasions. So, in this case, we can be fairly confident that the population value of exp b lies between 4.84 and 51.71. However, there is a 5% chance that a sample could give a confidence interval that misses the true value.

69

Model if Term Removed Model Log Likelihood -48.062 Change in -2 Log Likelihood 26.083 Sig. of the Change .000

Variable Step 1 FB

df 1

Variables not in the Equation Step 1 Variables Overall Statistics AGE AGE by FB(1) Score 2.313 1.261 2.521 df 1 1 2 Sig. .128 .261 .283

The test statistics for fb if it were removed from the model are reported above. The regression tests whether they then met a removal criterion. Well, the Model if Term
Removed part of the output tells us the effects of removal. The important thing to note is

the significance value of the log-likelihood ratio (log LR). The log LR for this model is highly significant (p < .0001) which tells us that removing fb from the model would have a significant effect on the predictive ability of the modelin other words, it would be a very bad idea to remove it! Finally, we are told about the variables currently not in the model. First of all, the residual chi-square (labelled Overall Statistics in the output), which is non-significant, tells us that none of the remaining variables have coefficients significantly different from zero. Furthermore, each variable is listed with its score statistic and significance value, and for both variables their coefficients are not significantly different from zero (as can be seen from the significance values of .128 for age and .261 for the interaction of age and false belief understanding). Therefore, no further variables will be added to the equation. The next part of the output displays the classification plot that we requested in the options dialog box. This plot is a histogram of the predicted probabilities of a child passing the

70

display rule task. If the model perfectly fits the data, then this histogram should show all of the cases for which the event has occurred on the right-hand side, and all the cases for which the event hasnt occurred on the left-hand side. In other words, all the children who passed the display rule task should appear on the right and all those who failed should appear on the left. In this example, the only significant predictor is dichotomous and so there are only two columns of cases on the plot. If the predictor is a continuous variable, the cases are spread out across many columns. As a rule of thumb, the more that the cases cluster at each end of the graph, the better. This statement is true because such a plot would show that when the outcome did actually occur (i.e. the child did pass the display rule task) the predicted probability of the event occurring is also high (i.e. close to 1). Likewise, at the other end of the plot it would show that when the event didnt occur (i.e. when the child failed the display rule task) the predicted probability of the event occurring is also low (i.e. close to 0). This situation represents a model that is correctly predicting the observed outcome data. If, however, there are a lot of points clustered in the centre of the plot then it shows that for many cases the model is predicting a probability of .5 that the event will occur. In other words, for these cases there is little more than a 50:50 chance that the data are correctly predictedas such the model could predict these cases just as accurately by simply tossing a coin! Also, a good model will ensure that few cases are misclassified, in this example there are two Ns on the right of the model and one Y on the left of the model. These are misclassified cases, and the fewer of these there are, the better the model. Observed Groups and Predicted Probabilities

71

Listing Predicted Probabilities SPSS saved the predicted probabilities and predicted group memberships as variables in the data editor and named them PRE_1 and PGR_1 respectively. These probabilities can be listed using the Case Summaries dialog box (see the book chapter). Below is a selection of the predicted probabilities (because the only significant predictor was a dichotomous variable, there will be only two different probability values). It is also worth listing the predictor variables as well to clarify from where the predicted probabilities come.

72

Case Summariesa Case Number 1 5 9 10 11 12 20 21 29 31 32 43 60 66 N Age in years 24.00 36.00 34.00 31.00 32.00 30.00 26.00 29.00 45.00 41.00 32.00 56.00 63.00 79.00 14 False Belief understanding No No No No No Yes No No Yes No No Yes No Yes 14 Display Rule understanding No No Yes No No Yes No No Yes Yes No Yes Yes Yes 14 Predicted probability .20690 .20690 .20690 .20690 .20690 .80488 .20690 .20690 .80488 .20690 .20690 .80488 .20690 .80488 14 Predicted group No No No No No Yes No No Yes No No Yes No Yes 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Total

a. Limited to first 100 cases.

We found from the model that the only significant predictor of display rule understanding was false belief understanding. This could have a value of either 1 (pass the false belief task) or 0 (fail the false belief task). These values tells us that when a child doesnt possess second-order false belief understanding (fb = 0, No), there is a probability of .2069 that they will pass the display rule task, approximately a 21% chance (1 out of 5 children). However, if the child does pass the false belief task (fb = 1, yes), there is a probability of .8049 that they will pass the display rule task, an 80.5% chance (4 out of 5 children). Consider that a probability of 0 indicates no chance of the child passing the display rule task, and a probability of 1 indicates that the child will definitely pass the display rule task. Therefore, the values obtained provide strong evidence for the role of false belief understanding as a prerequisite for display rule understanding. Assuming we are content that the model is accurate and that false belief understanding has some substantive significance, then we could conclude that false belief understanding is the single best predictor of display rule understanding. Furthermore, age and the interaction of age and false belief understanding do not significantly predict display rule

73

understanding. As a homework task, why not rerun this analysis using the forced entry method of analysis how do your conclusions differ? This conclusion is fine in itself, but to be sure that the model is a good one, it is important to examine the residuals. Interpreting Residuals The main purpose of examining residuals in logistic regression is to (1) isolate points for which the model fits poorly, and (2) isolate points that exert an undue influence on the model. To assess the former we examine the residuals, especially the Studentized residual, standardized residual and deviance statistics. All of these statistics have the common property that 95% of cases in an average, normally distributed sample should have values which lie within 1.96, and 99% of cases should have values that lie within 2.58. Therefore, any values outside of 3 are cause for concern and any outside of about 2.5 should be examined more closely. To assess the influence of individual cases we use influence statistics such as Cooks distance (which is interpreted in the same way as for linear regression: as a measure of the change in the regression coefficient if a case is deleted from the model). Also, the value of DFBeta, which is a standardized version of Cooks statistic, tells us something of the influence of certain cases any values greater than 1 indicate possible influential cases. Additionally, leverage statistics or hat values, which should lie between 0 (the case has no influence whatsoever) and 1 (the case exerts complete influence over the model) tell us about whether certain cases are wielding undue influence over the model. The expected value of leverage is defined as for linear regression.

74

If you request these residual statistics, SPSS saves them in as new columns in the data editor. The basic residual statistics for this example (Cooks distance, leverage, standardized residuals and DFBeta values) show little cause for concern. Note that all cases have DFBetas less than 1 and leverage statistics (LEV_1) close to the calculated expected value of 0.03. There are also no unusually high values of Cooks distance (COO_1) which, all in all, means that there are no influential cases having an effect on the model. Cooks distance is an unstandardized measure and so there is no absolute value at which you can say that a case is having an influence, Instead, you should look for values of Cooks distance which are particularly high compared to the other cases in the sample. However, Stevens (2002) suggests that a value greater than 1 is problematic. About half of the leverage values are a little high but given that the other statistics are fine, this is probably no cause for concern. The standardized residuals all have values between 2.5 and predominantly have values between 2 and so there seems to be very little here to concern us.

75

Case Summariesa Analog of Cook's influence statistics .00932 .00932 .00932 .00932 .00932 .13690 .00932 .00932 .13690 .00932 .00932 .00606 .00932 .10312 .00932 .00932 .13690 .00932 .00606 .00932 .00932 .00606 .00606 .10312 .00932 .00932 .00932 .00932 .00606 .00932 .13690 .00932 .00606 .00606 .00606 .00606 .10312 .00606 .13690 .10312 .00606 .00606 .00606 .00606 .00606 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Total

Case Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 N

Leverage value .03448 .03448 .03448 .03448 .03448 .03448 .03448 .03448 .03448 .03448 .03448 .02439 .03448 .02439 .03448 .03448 .03448 .03448 .02439 .03448 .03448 .02439 .02439 .02439 .03448 .03448 .03448 .03448 .02439 .03448 .03448 .03448 .02439 .02439 .02439 .02439 .02439 .02439 .03448 .02439 .02439 .02439 .02439 .02439 .02439 45

Normalized residual -.51075 -.51075 -.51075 -.51075 -.51075 1.95789 -.51075 -.51075 1.95789 -.51075 -.51075 .49237 -.51075 -2.03101 -.51075 -.51075 1.95789 -.51075 .49237 -.51075 -.51075 .49237 .49237 -2.03101 -.51075 -.51075 -.51075 -.51075 .49237 -.51075 1.95789 -.51075 .49237 .49237 .49237 .49237 -2.03101 .49237 1.95789 -2.03101 .49237 .49237 .49237 .49237 .49237 45

DFBETA for constant -.04503 -.04503 -.04503 -.04503 -.04503 .17262 -.04503 -.04503 .17262 -.04503 -.04503 .00000 -.04503 .00000 -.04503 -.04503 .17262 -.04503 .00000 -.04503 -.04503 .00000 .00000 .00000 -.04503 -.04503 -.04503 -.04503 .00000 -.04503 .17262 -.04503 .00000 .00000 .00000 .00000 .00000 .00000 .17262 .00000 .00000 .00000 .00000 .00000 .00000 45

DFBETA for FB(1) .04503 .04503 .04503 .04503 .04503 -.17262 .04503 .04503 -.17262 .04503 .04503 .03106 .04503 -.12812 .04503 .04503 -.17262 .04503 .03106 .04503 .04503 .03106 .03106 -.12812 .04503 .04503 .04503 .04503 .03106 .04503 -.17262 .04503 .03106 .03106 .03106 .03106 -.12812 .03106 -.17262 -.12812 .03106 .03106 .03106 .03106 .03106 45

a. Limited to first 100 cases.

76

Case Summariesa Analog of Cook's influence statistics .10312 .00606 .00932 .00932 .10312 .00606 .00606 .00606 .00606 .00606 .00606 .10312 .00606 .00606 .13690 .00606 .00606 .00606 .00606 .00606 .00606 .00606 .10312 .00606 .00606 25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Total

Case Number 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 N

Leverage value .02439 .02439 .03448 .03448 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .03448 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .02439 .02439 25

Normalized residual -2.03101 .49237 -.51075 -.51075 -2.03101 .49237 .49237 .49237 .49237 .49237 .49237 -2.03101 .49237 .49237 1.95789 .49237 .49237 .49237 .49237 .49237 .49237 .49237 -2.03101 .49237 .49237 25

DFBETA for constant .00000 .00000 -.04503 -.04503 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .17262 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 .00000 25

DFBETA for FB(1) -.12812 .03106 .04503 .04503 -.12812 .03106 .03106 .03106 .03106 .03106 .03106 -.12812 .03106 .03106 -.17262 .03106 .03106 .03106 .03106 .03106 .03106 .03106 -.12812 .03106 .03106 25

a. Limited to first 100 cases.

You should note that these residuals are slightly unusual because they are based on a single predictor that is categorical. This is why there isnt a lot of variability in the values of the residuals. Also, if substantial outliers or influential cases had been isolated, you are not justified in eliminating these cases to make the model fit better. Instead these cases should be inspected closely to try to isolate a good reason why they were unusual. It might simply be an error in inputting data, or it could be that the case was one which had a special reason for being unusual: for example, the child had found it hard to pay attention to the false belief task and you had noted this at the time of the experiment. In such a case, you may have good reason to exclude the case and duly note the reasons why.
Task 2

77

Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and subsequent burnout. She took 467 lecturers and administered several questionnaires to them that measured: Burnout (burnt out or not),
Perceived Control (high score = low perceived control), Coping Style (high

score = low ability to cope with stress), Stress from Teaching (high score = teaching creates a lot of stress for the person), Stress from Research (high score = research creates a lot of stress for the person) and Stress from Providing
Pastoral Care (high score = providing pastoral care creates a lot of stress for the

person). The outcome of interest was burnout, and Coopers (1988) model of stress indicates that perceived control and coping style are important predictors of this variable. The remaining predictors were measured to see the unique contribution of different aspects of a lecturers work to their burnout. Can you help her out by conducting a logistic regression to see which factor predict burnout? The data are in Burnout.sav.

Test The analysis should be done hierarchically because Coopers model indicates that perceived control and coping style are important predictors of burnout. So, these variables should be entered in the first block. The second block should contain all other variables and because we dont know anything much about their predictive ability, we should enter them in a stepwise fashion (I chose Forward: LR). SPSS Output
78

Step 1:
Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 165.928 165.928 165.928 df 2 2 2 Sig. .000 .000 .000

Model Summary Step 1 -2 Log likelihood 364.179 Cox & Snell R Square .299 Nagelkerke R Square .441

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 1.040 1.086 1.066 1.106

Step a 1

LOC COPE Constant

B .061 .083 -4.484

S.E. .011 .009 .379

Wald 31.316 77.950 139.668

df 1 1 1

Sig. .000 .000 .000

Exp(B) 1.063 1.086 .011

a. Variable(s) entered on step 1: LOC, COPE.

The overall fit of the model is significant both at the first step, 2(2) = 165.93, p < .001. Overall, the model accounts for 29.944.1% of the variance in burnout (depending on which measure R2 you use). Step 2: The overall fit of the model is significant after both at the first new variable (teaching),

2(3) = 193.34, p < .001, and second new variable (pastoral) have been entered, 2(4) =
205.40, p < .001 Overall, the final model accounts for 35.652.4% of the variance in burnout (depending on which measure R2 you use.
Omnibus Tests of Model Coefficients Step 1 Step Block Model Step Block Model Chi-square 27.409 27.409 193.337 12.060 39.470 205.397 df 1 1 3 1 2 4 Sig. .000 .000 .000 .001 .000 .000

Step 2

79

Model Summary Step 1 2 -2 Log likelihood 336.770 324.710 Cox & Snell R Square .339 .356 Nagelkerke R Square .500 .524

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 1.068 1.126 1.107 1.173 .890 .952 1.081 1.110 .862 1.019 1.145 1.181 .931 1.071

Step 1a

Step 2b

LOC COPE TEACHING Constant LOC COPE TEACHING PASTORAL Constant

B .092 .131 -.083 -1.707 .107 .135 -.110 .044 -3.023

S.E. .014 .015 .017 .619 .015 .016 .020 .013 .747

Wald 46.340 76.877 23.962 7.599 52.576 75.054 31.660 11.517 16.379

df 1 1 1 1 1 1 1 1 1

Sig. .000 .000 .000 .006 .000 .000 .000 .001 .000

Exp(B) 1.097 1.139 .921 .181 1.113 1.145 .896 1.045 .049

a. Variable(s) entered on step 1: TEACHING. b. Variable(s) entered on step 2: PASTORAL.

In terms of the individual predictors we could report: B (SE) Lower Step 1 Constant 4.48** (0.38) Perceived Control Coping Style 0.06** (0.01) 0.08** (0.01) Final 1.07 1.09 1.11 1.04 1.06 1.09 Exp() Upper 95% CI for Exp(B)

80

Constant

3.02** (0.75)

Perceived Control Coping Style

0.11** (0.02) 0.14** (0.02)

1.08

1.11

1.15

1.11

1.15

1.18

Teaching Stress Pastoral Stress

0.11** (0.02) 0.04* (0.01)

0.86

0.90

0.93

1.02

1.05

1.07

Note: R2 = .36 (Cox and Snell), .52 (Nagelkerke). Model 2(4) = 205.40, p < .001. * p < .01, ** p < .001. It seems as though burnout is significantly predicted by perceived control, coping style (as predicted by Cooper), stress from teaching and stress from giving pastoral care. The Exp(B) and direction of the beta values tells us that, for perceived control, coping ability and pastoral care, the relationships are positive. That is (and look back to the question to see the direction of these scales, i.e. what a high score represents), poor perceived control, poor ability to cope with stress and stress from giving pastoral care all predict burnout. However, for teaching, the relationship if the opposite way around: stress from teaching appears to be a positive thing as it predicts not becoming burnt out!
Task 3

81

A health psychologist interested in research into HIV wanted to know the factors that influenced condom use with a new partner (relationship less than 1 month old). The outcome measure was whether a condom was used (Use: condom used = 1, not used = 0). The predictor variables were mainly scales from the Condom Attitude Scale (CAS) by Sacco, Levine, Reed, and Thompson (Psychological Assessment: A Journal of Consulting and Clinical Psychology, 1991). Gender (gender of the person); safety (relationship safety, measured out of 5, indicates the degree to which the person views this relationship as safe from sexually transmitted disease); sexexp (sexual experience, measured out of 10, indicates the degree to which previous experience influences attitudes towards condom use);
previous (a measure not from the CAS, this variable measures whether or not the

couple used a condom in their previous encounter, 1 = condom used, 0 = not used, 2 = no previous encounter with this partner); selfcon (self-control, measured out of 9, indicates the degree of self-control that a subject has when it comes to condom use, i.e. do they get carried away with the heat of the moment, or do they exert control?); perceive (perceived risk, measured out of 6, indicates the degree to which the person feels at risk from unprotected sex). Previous research (Sacco, Rickman, Thompson, Levine, and Reed, in Aids Education and Prevention, 1993) has shown that gender, relationship safety and perceived risk predict condom use. Carry out an appropriate analysis to verify these previous findings, and to test whether self-control, previous usage and sexual experience can predict any of the remaining variance in condom use. (1) Interpret all important parts of the SPSS output. (2) How reliable is the final model? (3) What are the probabilities that

82

participants 12, 53 and 75 will used a condom? (4) A female, who used a condom in her previous encounter with her new partner, scores 2 on all variables except perceived risk (for which she scores 6). Use the model to estimate the probability that she will use a condom in her next encounter.

The correct analysis was to run a hierarchical logistic regression entering perceive,
safety and gender in the first block and previous, selfcon and sexexp in a second. I used

forced entry on both blocks, but you could choose to run a forward stepwise method on block 2 (either strategy is justified). For the variable previous I used an indicator contrast with No condom as the base category. Block 0: The output of the logistic regression will be arranged in terms of the blocks that were specified. In other words, SPSS will produce a regression model for the variables specified in block 1, and then produce a second model that contains the variables from both blocks 1 and 2. The results from block 1 are shown below. In this analysis we forced SPSS to enter perceive, safety and gender into the regression model first. First, the output tells us that 100 cases have been accepted, that the dependent variable has been coded 0 and 1 (because this variable was coded as 0 and 1 in the data editor, these codings correspond exactly to the data in SPSS).
Case Processing Summary Unweighted Cases Selected Cases
a

N Included in Analysis Missing Cases Total 100 0 100 0 100

Unselected Cases Total

Percent 100.0 .0 100.0 .0 100.0

a. If weight is in effect, see classification table for the total number of cases.

83

Dependent Variable Encoding Original Value Unprotected Condom Used Internal Value 0 1

Categorical Variables Codings Parameter coding (1) (2) .000 .000 1.000 .000 .000 1.000

Previous Use with Partner

No Condom Condom used First Time with partner

Frequency 50 47 3

a,b Classification Table

Predicted Condom Use Condom Unprotected Used 57 0 43 0

Step 0

Observed Condom Use Overall Percentage

Unprotected Condom Used

Percentage Correct 100.0 .0 57.0

a. Constant is included in the model. b. The cut value is .500

Block 1: The next part of the output tells us about block 1: as such it provides information about the model after the variables perceive, safety and gender have been added. The first thing to note is that 2LL has dropped to 105.77, which is a change of 30.89 (which is the value given by the model chi-square). This value tells us about the model as a whole whereas the block tells us how the model has improved since the last block. The change in the amount of information explained by the model is significant (2(3) = 30.92, p < 0.0001) and so using perceived risk, relationship safety and gender as predictors significantly improves our ability to predict condom use. Finally, the classification table shows us that 74% of cases can be correctly classified using these three predictors.

84

Omnibus Tests of Model Coefficients Chi-square 30.892 30.892 30.892 df 3 3 3 Sig. .000 .000 .000

Step 1

Step Block Model

Model Summary -2 Log likelihood 105.770 Cox & Snell R Square .266 Nagelkerke R Square .357

Step 1

a Classification Table

Predicted Condom Use Condom Used Unprotected 45 12 14 29

Step 1

Observed Condom Use Overall Percentage

Unprotected Condom Used

Percentage Correct 78.9 67.4 74.0

a. The cut value is .500

Hosmer and Lemeshows goodness-of-fit test statistic tests the hypothesis that the observed data are significantly different from the predicted values from the model. So, in effect, we want a non-significant value for this test (because this would indicate that the model does not differ significantly from the observed data). In this case (2(8) = 9.70, p = 0.287) it is non-significant, which is indicative of a model that is predicting the realworld data fairly well.
Hosmer and Lemeshow Test Step 1 Chi-square 9.700 df 8 Sig. .287

The part of the output labelled Variables in the Equation then tells us the parameters of the model for the first block. The significance values of the Wald statistics for each predictor indicate that both perceived risk (Wald = 17.76, p < 0.0001) and relationship safety (Wald = 4.54, p < 0.05) significantly predict condom use. Gender, however, does not (Wald = 0.41, p > 0.05).
85

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 1.654 3.964 .410 .963 .519 3.631

Step a 1

PERCEIVE SAFETY GENDER Constant

B .940 -.464 .317 -2.476

S.E. .223 .218 .496 .752

Wald 17.780 4.540 .407 10.851

df 1 1 1 1

Sig. .000 .033 .523 .001

Exp(B) 2.560 .629 1.373 .084

a. Variable(s) entered on step 1: PERCEIVE, SAFETY, GENDER.

The values of exp for perceived risk (exp = 2.56, CI0.95 = 1.65, 3.96) indicate that if the value of perceived risk goes up by 1, then the odds of using a condom also increase (because exp is greater than 1). The confidence interval for this value ranges from 1.65 to 3.96 so we can be very confident that the value of exp in the population lies somewhere between these two values. Whats more, because both values are greater than 1 we can also be confident that the relationship between perceived risk and condom use found in this sample is true of the whole population. In short, as perceived risk increase by 1, people are just over twice as likely to use a condom. The values of exp for relationship safety (exp = 0.63, CI0.95 = 0.41, 0.96) indicate that if the relationship safety increases by one point, then the odds of using a condom decrease (because exp is less than 1). The confidence interval for this value ranges from 0.41 to 0.96 so we can be very confident that the value of exp in the population lies somewhere between these two values. In addition, because both values are less than 1 we can be confident that the relationship between relationship safety and condom use found in this sample would be found in 95% of samples from the same population. In short, as relationship safety increases by one unit, subjects are about 1.6 times less likely to use a condom. The values of exp for gender (exp = 1.37, CI0.95 = 0.52, 3.63) indicate that as gender changes from 0 (male) to 1 (female), then the odds of using a condom increase (because
86

exp is greater than 1). However, the confidence interval for this value crosses 1 which limits the generalizability of our findings because the value of exp in other samples (and hence the population) could indicate either a positive (exp(B) > 1) or negative (exp(B) < 1) relationship. Therefore, gender is not a reliable predictor of condom use.

A glance at the classification plot brings not such good news because a lot of cases are clustered around the middle. This indicates that the model could be performing more accurately (i.e. the classifications made by the model are not completely reliable).

Block 2: The output below shows what happens to the model when our new predictors are added (previous use, self-control and sexual experience). This part of the output describes block

87

2, which is just the model described in block 1 but with a new predictors added. So, we begin with the model that we had in block 1 and we then add previous, selfcon and
sexexp to it. The effect of adding these predictors to the model is to reduce the 2 log-

likelihood to 87.971 (a reduction of 48.69 from the original model as shown in the model chi-square and an additional reduction of 17.799 from the reduction caused by block 1 as shown by the block statistics). This additional improvement of block 2 is significant (2(4) = 17.80, p < 0.01), which tells us that including these three new predictors in the model has significantly improved our ability to predict condom use. The classification table tells us that the model is now correctly classifying 78% of cases. Remember that in block 1 there were 74% correctly classified and so an extra 4% of cases are now classified (not a great deal morein fact, examining the table shows us that only four extra cases have now been correctly classified).
Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 17.799 17.799 48.692 df 4 4 7 Sig. .001 .001 .000

Model Summary -2 Log likelihood 87.971 Cox & Snell R Square .385 Nagelkerke R Square .517

Step 1

Hosmer and Lemeshow Test Step 1 Chi-square 9.186 df 8 Sig. .327

88

a Classification Table

Predicted Condom Use Condom Unprotected Used 47 10 12 31

Step 1

Observed Condom Use Overall Percentage

Unprotected Condom Used

Percentage Correct 82.5 72.1 78.0

a. The cut value is .500

The section labelled Variables in the Equation now contains all predictors. This part of the output represents the details of the final model. The significance values of the Wald statistics for each predictor indicate that both perceived risk (Wald = 16.04, p < 0.001) and relationship safety (Wald = 4.17, p < 0.05) still significantly predict condom use and, as in block 1, gender does not (Wald = 0.00, p > 0.05). We can now look at the new predictors to see which of these has some predictive power.
Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 1.623 4.109 .389 .980 .326 3.081 .962 1.490 1.005 .063 1.104 8.747 15.287 1.815

Step a 1

PERCEIVE SAFETY GENDER SEXEXP PREVIOUS PREVIOUS(1) PREVIOUS(2) SELFCON Constant

B .949 -.482 .003 .180 1.087 -.017 .348 -4.959

S.E. .237 .236 .573 .112 .552 1.400 .127 1.146

Wald 16.038 4.176 .000 2.614 4.032 3.879 .000 7.510 18.713

df 1 1 1 1 2 1 1 1 1

Sig. .000 .041 .996 .106 .133 .049 .990 .006 .000

Exp(B) 2.583 .617 1.003 1.198 2.965 .983 1.416 .007

a. Variable(s) entered on step 1: SEXEXP, PREVIOUS, SELFCON.

Previous use has been split into two components (according to whatever contrasts were specified for this variable). Looking at the very beginning of the output we are told the parameter codings for Previous(1) and previous(2). You can tell by remembering the rule from contrast coding in ANOVA which groups are being compared: that is, we compare groups with codes of 0 against those with codes of 1. From the output we can see that Previous(1) compares the condom used group against the other two, and
Previous(2) compares the base category of first time with partner against the other two
89

categories. Therefore we can tell that previous use is not a significant predictor of condom use when it is the first time with a partner compared to when it is not the first time (Wald = 0.00, p < 0.05). However, when we compare the condom used category to the other categories we find that using a condom on the previous occasion does predict use on the current occasion (Wald = 3.88, p < 0.05). Of the other new predictors we find that self-control predicts condom use (Wald = 7.51, p < 0.01) but sexual experience does not (Wald = 2.61, p > 0.05). The values of exp for perceived risk (exp = 2.58, CI0.95 = 1.62, 4.106) indicate that if the value of perceived risk goes up by 1, then the odds of using a condom also increase. Whats more, because the confidence interval doesnt cross 1 we can also be confident that the relationship between perceived risk and condom use found in this sample is true of the whole population. As perceived risk increases by 1, people are just over twice as likely to use a condom. The values of exp for relationship safety (exp = 0.62, CI0.95 = 0.39, 0.98) indicate that if the relationship safety decreases by one point, then the odds of using a condom increase. The confidence interval does not cross 1 so we can be confident that the relationship between relationship safety and condom use found in this sample would be found in 95% of samples from the same population. As relationship safety increases by one unit, subjects are about 1.6 times less likely to use a condom. The values of exp for gender (exp = 1.00, CI0.95 = 0.33, 3.08) indicate that as gender changes from 0 (male) to 1 (female), then the odds of using a condom do not change

90

(because exp is equal to 1). The confidence interval crosses 1, therefore gender is not a reliable predictor of condom use. The values of exp for previous use (1) (exp = 2.97, CI0.95 = 1.01, 8.75) indicate that if the value of previous usage goes up by 1 (i.e. changes from not having used one or being the first time to having used one), then the odds of using a condom also increase. Whats more, because the confidence interval doesnt cross 1 we can also be confident that this relationship is true in the whole population. If someone used a condom on their previous encounter with this partner (compared to if they didnt use one, or if it is their first time) then they are three times more likely to use a condom. For previous use (2) the value of exp (exp = 0.98, CI0.95 = 0.06, 15.29) indicates that if the value of previous usage goes up by 1 (i.e. changes from not having used one or having used one to being their first time with this partner), then the odds of using a condom do not change (because the value is very nearly equal to 1). Whats more, because the confidence interval crosses 1 we can tell that this is not a reliable predictor of condom use. The value of exp for self-control (exp = 1.42, CI0.95 = 1.10, 1.82) indicates that if selfcontrol increases by one point, then the odds of using a condom increase also. The confidence interval does not cross 1 so we can be confident that the relationship between relationship safety and condom use found in this sample would be found in 95% of samples from the same population. As self-control increases by one unit, subjects are about 1.4 times more likely to use a condom. The values of exp for sexual experience (exp = 1.20, CI0.95 = 0.95, 1.49) indicate that as sexual experience increases by one unit, then the odds of using a condom increase

91

slightly. However, the confidence interval crosses 1, therefore sexual experience is not a reliable predictor of condom use. A glance at the classification plot brings good news because a lot of cases that were clustered in the middle are now spread towards the edges. Therefore, overall this new model is more accurately classifying cases compared to block 1.

How reliable is the final model? Multicollinearity can affect the parameters of a regression model. Logistic regression is equally as prone to the biasing effect of collinearity and it is essential to test for collinearity following a logistic regression analysis (see the book for details of how to do this). The results of the analysis are shown below. From the first table we can see that the tolerance values for all variables are all close to 1 and are much larger than the cut-off

92

point of 0.1 below which Menard (1995) suggests indicates a serious collinearity problem. Myers (1990) also suggests that a VIF value greater than 10 is cause for concern and in these data the values are all less than this criterion. The output below also shows a table labelled Collinearity Diagnostics. In this table, we are given the eigenvalues of the scaled, uncentred cross-products matrix, the condition index and the variance proportions for each predictor. If any of the eigenvalues in this table are much larger than others then the uncentred cross-products matrix is said to be ill-conditioned, which means that the solutions of the regression parameters can be greatly affected by small changes in the predictors or outcome. In plain English, these values give us some idea as to how accurate our regression model is: if the eigenvalues are fairly similar then the derived model is likely to be unchanged by small changes in the measured variables. The condition indexes are another way of expressing these eigenvalues and represent the square root of the ratio of the largest eigenvalue to the eigenvalue of interest (so, for the dimension with the largest eigenvalue, the condition index will always be 1). For these data the condition indexes are all relatively similar showing that a problem is unlikely to exist.
Coefficients a Collinearity Statistics Tolerance VIF .849 1.178 .802 1.247 .910 1.098 .740 1.350 .796 1.256 .885 1.130 .964 1.037 .872 1.147 .929 1.076

Model 1

Perceived Risk Relationship Safety GENDER Perceived Risk Relationship Safety GENDER Previous Use with Partner Self-Control Sexual experience

a. Dependent Variable: Condom Use

93

a Collinearity Diagnostics

Model 1

Dimension 1 2 3 4 1 2 3 4 5 6 7

Eigenvalue 3.137 .593 .173 9.728E-02 5.170 .632 .460 .303 .235 .135 6.510E-02

Condition Index 1.000 2.300 4.260 5.679 1.000 2.860 3.352 4.129 4.686 6.198 8.911

(Constant) .01 .00 .01 .98 .00 .00 .00 .00 .00 .01 .98

Perceived Risk .02 .02 .55 .40 .01 .02 .03 .07 .04 .61 .23

Relationship Safety .02 .10 .76 .13 .01 .06 .10 .01 .34 .40 .08

Variance Proportions Previous Use with GENDER Partner .03 .55 .08 .35 .01 .01 .43 .10 .01 .80 .24 .00 .17 .05 .00 .00 .14 .03

Self-Control

Sexual experience

.01 .00 .00 .00 .50 .47 .03

.01 .02 .00 .60 .00 .06 .31

a. Dependent Variable: Condom Use

The final step in analysing this table is to look at the variance proportions. The variance of each regression coefficient can be broken down across the eigenvalues and the variance proportions tell us the proportion of the variance of each predictors regression coefficient that is attributed to each eigenvalue. These proportions can be converted to percentages by multiplying them by 100 (to make them more easily understood). In terms of collinearity, we are looking for predictors that have high-proportions on the same small eigenvalue, because this would indicate that the variances of their regression coefficients are dependent (see Field, 2004). Again, no variables appear to have similarly high variance proportions for the same dimensions. The result of this analysis is pretty clear cut: there is no problem of collinearity in these data. Residuals should be checked for influential cases and outliers. As a brief guide, the output lists cases with standardized residuals greater than 2. In a sample of 100, we would expect around 510% of cases to have standardized residuals with absolute values greater than this. For these data we have only four cases and only one of these has an absolute value greater than 3. Therefore, we can be fairly sure that there are no outliers.

94

Casewise Listb

Case 41 53 58 83

Selected a Status S S S S

Observed Condom Use U** U** C** C**

Predicted .891 .916 .142 .150

Predicted Group C C U U

Temporary Variable Resid ZResid -.891 -2.855 -.916 -3.294 .858 2.455 .850 2.380

a. S = Selected, U = Unselected cases, and ** = Misclassified cases. b. Cases with studentized residuals greater than 2.000 are listed.

What are the probabilities that participants 12, 53 and 75 will used a condom? The values predicted for these cases will depend on exactly how you ran the analysis (and the parameter coding used on the variable previous). Therefore, your answers might differ slightly from mine.
a m

Case Summariesa Case Number 12 53 75 a. Limited to first 100 cases. Predicted Value .49437 .88529 .37137

a m l

12 53 75
a L

Predicted Group Unprotected Condom Used Unprotected

A female, who used a condom in her previous encounter with her new partner, scores 2 on all variables except perceived risk (for which she scores 6). Use the model to estimate the probability that she will use a condom in her next encounter.
Step 1: Logistic Regression Equation:

1 1 + e z where Z = 0 + 1 X1 + 2 X 2 + K + n X n P (Y ) =

95

Step 2: Use the values of from the SPSS output (final model) and the values of X for

each variable (from the question) to construct the following table:

Variable

i
0.0027 0.4823 0.1804 1.0870 .0167 0.3476 0.9489

Xi

i Xi
0.0027 0.9646 0.3608 1.0870 0 0.6952 5.6934

Gender Safety Sexexp Previous (1) Previous (2) Selfcon Perceive

1 2 2 1 0 2 6

Step 3: Place the values of i Xi into the equation for z (remembering to include the

constant):

z = 4.6009 + 0.0027 0.9646 + 0.3608 + 1.0870 + 0 + 0.952 + 5.6934 = 2.2736

Step 4: Replace this value of z into the logistic regression equation:

96

P(Y ) =

1 1 + e z 1 = 1 + e 2.2736 1 = 1 + 0.10 = 0.9090

Therefore, there is a 91% chance that she will use a condom on her next encounter.
Chapter 9 Task 1

One of my pet hates is pop psychology books. Along with banishing Freud from all bookshops, it is my vowed ambition to rid the world of these rancid putrefaction-ridden wastes of trees. Not only do they give psychology a very bad name by stating the bloody obvious and charging people for the privilege, but they are also considerably less enjoyable to look at than the trees killed to produce them (admittedly the same could be said for the turgid tripe that I produce in the name of education but lets not go there just for now!). Anyway, as part of my plan to rid the world of popular psychology I did a little experiment. I took two groups of people who were in relationships and randomly assigned them to one of two conditions. One group read the famous popular psychology book Women are from Bras and men are from Penis, whereas another group read Marie Claire. I tested only 10 people in each of these groups, and the dependent variable was an

97

objective measure of their happiness with their relationship after reading the book. I didnt make any specific prediction about which reading material would improve relationship happiness. SPSS Output for the Independent t-test
Group Statistics Std. Error Mean 1.29957 1.48922

Relationship Happiness

Book Read Women are from Bras, Men are from Penis Marie Claire

N 10 10

Mean 20.0000 24.2000

Std. Deviation 4.10961 4.70933

Independent Samples Test Levene's Test for Equality of Variances

t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -8.35253 -8.35800 -.04747 -.04200

F Relationship Happiness Equal variances assumed Equal variances not assumed .491

Sig. .492

t -2.125 -2.125

df 18 17.676

Sig. (2-tailed) .048 .048

Mean Difference -4.2000 -4.2000

Std. Error Difference 1.97653 1.97653

Calculating the Effect Size We know the value of t and the df from the SPSS output and so we can compute r as follows:
2.1252 2.1252 +18 4.52 22.52

r= =

= 0.45
If you think back to our benchmarks for effect sizes this represents a fairly large effect (it is just below 0.5, the threshold for a large effect). Therefore, as well as being statistically significant, this effect is large and so represents a substantive finding. Reporting the Results

98

When you report any statistical test you usually state the finding to which the test relates, and then in brackets report the test statistic (usually with its degrees of freedom), the probability value of that test statistic, and more recently the American Psychological Association is, quite rightly, requesting an estimate of the effect size. To get you into good habits early, well start thinking about effect sizes now, before you get too fixated on Fishers magic 0.05. In this example we know that the value of t was 2.12, that the degrees of freedom on which this was based were 18, and that it was significant at p = 0.048. This can all be obtained from the SPSS output. We can also see the means for each group. Based on what we learnt about reporting means, we could now write something like: On average, the reported relationship happiness after reading Marie Claire (M = 24.20, SE = 1.49), was significantly higher than after reading Women are from Bras and men are from Penis (M = 20.00, SE = 1.30) (t(18) = -2.12, p < .05, r = .45).
Task 2

Imagine Twaddle and Sons, the publishers of Women are from Bras men are from Penis, were upset about my claims that their book was about as useful as a paper umbrella. They decided to take me to task and design their own experiment in which participants read their book, and one of my books this book (Field and Hole) at different times. Relationship happiness was measured after reading each book. To maximize their chances of finding a difference they used a sample of 500 participants, but got each participant to take part in both conditions (they read

99

both books). The order in which books were read was counterbalanced and there was a delay of six months between reading the books. They predicted that reading their wonderful contribution to popular psychology would lead to greater relationship happiness than reading some dull and tedious book about experiments. The data are in Field&Hole.sav. Analyse them using the appropriate t-test.

SPSS Output
Paired Samples Statistics Std. Error Mean .44637 .40211

Pair 1

Women are from Bras, Men are from Penis Field & Hole

Mean 20.0180 18.4900

N 500 500

Std. Deviation 9.98123 8.99153

Paired Samples Correlations N Pair 1 Women are from Bras, Men are from Penis & Field & Hole 500 Correlation .117 Sig. .009

Paired Samples Test

Paired Differences 95% Confidence Interval of the Difference Lower Upper .4184 2.6376

Mean Pair 1 Women are from Bras, Men are from Penis - Field & Hole 1.5280

Std. Deviation 12.62807

Std. Error Mean .56474

t 2.706

df 499

Sig. (2-tailed) .007

Calculating the Effect Size We know the value of t and the df from the SPSS output and so we can compute r as follows:

100

r= =

2.7062 2.7062 + 499 7.32 506.32

= 0.12
If you think back to our benchmarks for effect sizes this represents a small effect (it is just above 0.1, the threshold for a small effect). Therefore, although this effect is highly statistically significant, the size of the effect is very small and so represents a trivial finding. Interpreting and Writing the Results In this example, it would be tempting for Twaddle and Sons to conclude that their book produced significantly greater relationship happiness than our book. In fact, many researchers would write conclusions like this: The results show that reading Women are from Bras, men are from Penis produces significantly greater relationship happiness than that book by smelly old Field and Hole. This result is highly significant. However, to reach such a conclusion is to confuse statistical significance with the importance of the effect. By calculating the effect size weve discovered that although the difference in happiness after reading the two books is statistically very different, the size of effect that this represents is very small indeed. So, the effect is actually not very significant in real terms. A more correct interpretation might be to say: The results show that reading Women are from Bras, men are from Penis produces significantly greater relationship happiness than that book by smelly

101

old Field and Hole. However, the effect size was small, revealing that this finding was not substantial in real terms. Of course, this latter interpretation would be unpopular with Twaddle and Sons who would like to believe that their book had a huge effect on relationship happiness.
Chapter 10 Task 1

Imagine that I was interested in how different teaching methods affected students knowledge. I noticed that some lecturers were aloof and arrogant in their teaching style and humiliated anyone who asked them a question, while others were encouraging and supportive of questions and comments. I took three statistics courses where I taught the same material. For one group of students I wandered around with a large cane and beat anyone who asked daft questions or got questions wrong (punish). In the second group I used my normal teaching style which is to encourage students to discuss things that they find difficult and to give anyone working hard a nice sweet (reward). The final group I remained indifferent to and neither punished nor rewarded their efforts (indifferent). As the dependent measure I took the students exam marks (percentage). Based on theories of operant conditioning, we expect punishment to be a very unsuccessful way of reinforcing learning, but we expect reward to be very successful. Therefore, one prediction is that reward will produce the best learning. A second hypothesis is that punishment should actually retard learning such that it is worse than an indifferent approach to learning. The data are in the file Teach.sav. Carry

102

out a one-way ANOVA and use planned comparisons to test the hypotheses that: (1) reward results in better exam results than either punishment or indifference; and (2) indifference will lead to significantly better exam results than punishment. SPSS Output
Descriptives Exam Mark 95% Confidence Interval for Mean Lower Bound Upper Bound 47.0409 52.9591 50.9192 61.0808 62.3241 68.4759 54.0483 60.2183

N Punish Indifferent Reward Total 10 10 10 30

Mean 50.0000 56.0000 65.4000 57.1333

Std. Deviation 4.13656 7.10243 4.29987 8.26181

Std. Error 1.30809 2.24598 1.35974 1.50839

Minimum 45.00 46.00 58.00 45.00

Maximum 57.00 67.00 71.00 71.00

This output shows the table of descriptive statistics from the one-way ANOVA; were told the means, standard deviations and standard errors of the means for each experimental condition. The means should correspond to those plotted in the graph. These diagnostics are important for interpretation later on. It looks as though marks are highest after reward and lowest after punishment.
Test of Homogeneity of Variances Exam Mark Levene Statistic 2.569 df1 2 df2 27 Sig. .095

The next part of the output reports a test of the assumption of homogeneity of variance (Levenes test). For these data, the assumption of homogeneity of variance has been met, because our significance is 0.095, which is bigger than the criterion of 0.05.
ANOVA Exam Mark Sum of Squares 1205.067 774.400 1979.467 df 2 27 29 Mean Square 602.533 28.681 F 21.008 Sig. .000

Between Groups Within Groups Total

103

The main ANOVA summary table shows us that because the observed significance value is less than 0.05 we can say that there was a significant effect of teaching style on exam marks. However, at this stage we still do not know exactly what the effect of the teaching style was (we dont know which groups differed).
Robust Tests of Equality of Means Exam Mark Welch Brown-Forsythe Statistic 32.235 21.008
a

df1 2 2

df2 17.336 20.959

Sig. .000 .000

a. Asymptotically F distributed.

This table shows the Welch and BrownForsythe Fs, but we can ignore these because the homogeneity of variance assumption was met.
Contrast Coefficients Type of Teaching Method Punish Indifferent Reward 1 1 -2 1 -1 0

Contrast 1 2

Because there were specific hypotheses I specified some contrasts. This table shows the codes I used. The first contrast compares reward (coded with 2) against punishment and indifference (both coded with 1). The second contrast compares punishment (coded with 1) against indifference (coded with 1). Note that the codes for each contrast sum to zero, and that in contrast 2 reward has been coded with a 0 because it is excluded from that contrast.
Contrast Tests Contrast 1 2 1 2 Value of Contrast -24.8000 -6.0000 -24.8000 -6.0000 Std. Error 4.14836 2.39506 3.76180 2.59915 t -5.978 -2.505 -6.593 -2.308 df 27 27 21.696 14.476 Sig. (2-tailed) .000 .019 .000 .036

Exam Mark

Assume equal variances Does not assume equal variances

104

This table shows the significance of the two contrasts specified above. Because homogeneity of variance was met, we can ignore the part of the table labelled Does not assume equal variances. The t-test for the first contrast tells us that reward was significantly different from punishment and indifference (its significantly different because the value in the column labelled Sig. is less than 0.05). Looking at the means, this tells us that the average mark after reward was significantly higher than the average mark for punishment and indifference combined. The second contrast (and the descriptive statistics) tells us that the marks after punishment were significantly lower than after indifference (again, its significantly different because the value in the column labelled Sig. is less than 0.05). As such we could conclude that reward produces significantly better exam grades than punishment and indifference, and that punishment produces significantly worse exam marks than indifference. So lecturers should reward their students, not punish them! Calculating the Effect Size The output provides us with three measures of variance: the between group effect (SSM), the within subject effect (MSR) and the total amount of variance in the data (SST). We can use these to calculate omega squared (2):

105

2 =

MSM MSR MSM + ( (n 1) MSR )

2 =

602.533 28.681 602.533 + ((30 1) 28.681) 573.852 = 1434.282 = .40

= .63
For the contrasts the effect sizes will be:
t2 = 2 t + df

rcontrast

rcontrast1 =

5.9782 5.9782 + 27 = 0.75

If you think back to our benchmarks for effect sizes this represents a huge effect (it is well above 0.5, the threshold for a large effect). Therefore, as well as being statistically significant, this effect is large and so represents a substantive finding. For contrast 2 we get:
2.5052 2.5052 + 27 = 0.43

rcontrast2 =

This too is a substantive finding and represents a medium to large effect size. Interpreting and Writing the Result The correct way to report the main finding would be:
106

All significant values are reported at p < .05.There was a significant effect of teaching style on exam marks, F(2, 27) = 21.01, 2 = .40. Planned contrasts revealed that reward produced significantly better exam grades than punishment and indifference, t(27) = 5.98, r = .75, and that punishment produced significantly worse exam marks than indifference, t(27) = 2.51, r = .43.
Task 2

In Chapter 15 there are some data looking at whether eating soya meals reduces your sperm count. Have a look at this section, access the data for that example, but analyse them with ANOVA. Whats the difference between what you find and what is found in section 15.5.4? Why do you think this difference has arisen?

SPSS Output

Descriptives Sperm Count (Millions) 95% Confidence Interval for Mean Lower Bound Upper Bound 2.6072 7.3663 2.4184 6.7921 2.0462 6.1740 1.1341 2.1719 2.8906 4.7869

N No Soya Meals 1 Soya Meal Per Week 4 Soyal Meals Per Week 7 Soya Meals Per Week Total 20 20 20 20 80

Mean 4.9868 4.6052 4.1101 1.6530 3.8388

Std. Deviation 5.08437 4.67263 4.40991 1.10865 4.26048

Std. Error 1.13690 1.04483 .98609 .24790 .47634

Minimum .35 .33 .40 .31 .31

Maximum 21.08 18.47 18.21 4.11 21.08

This output shows the table of descriptive statistics from the one-way ANOVA. It looks as though, as soya intake increases, sperm counts do indeed decrease.
Test of Homogeneity of Variances Sperm Count (Millions) Levene Statistic 5.117 df1 3 df2 76 Sig. .003

107

The next part of the output reports a test of the assumption of homogeneity of variance (Levenes test). For these data, the assumption of homogeneity of variance has been broken, because our significance is 0.003, which is smaller than the criterion of 0.05. In fact, these data also violate the assumption of normality (see the chapter on non parametric statistics).
ANOVA Sperm Count (Millions) Sum of Squares 135.130 1298.853 1433.983 df 3 76 79 Mean Square 45.043 17.090 F 2.636 Sig. .056

Between Groups Within Groups Total

The main ANOVA summary table shows us that because the observed significance value is greater than 0.05 we can say that there was no significant effect of soya intake on mens sperm count. This is strange because if you read the chapter on non-parametric statistics from where this example came, the KruskalWallis test produced a significant result! The reason for this difference is that the data violate the assumptions of normality and homogeneity of variance. As I mention in the chapter on non-parametric statistics, although parametric tests have more power to detect effects when their assumptions are met, when their assumptions are violated non-parametric tests have more power! This example was arranged to prove this point: because the parametric assumptions are violated, the non-parametric tests produced a significant result and the parametric test did not because, in these circumstances, the non-parametric test has the greater power!
Robust Tests of Equality of Means Sperm Count (Millions) Welch Brown-Forsythe Statistic 6.284 2.636
a

df1 3 3

df2 34.657 58.236

Sig. .002 .058

a. Asymptotically F distributed.

108

This table shows the Welch and BrownForsythe Fs; note that the Welch test agrees with the non-parametric test in that the significance of F is below the 0.05 threshold. However, the BrownForsythe F is non-significant (it is just above the threshold). This illustrates the relative superiority of the Welch procedure. However, in these circumstances, because normality and homogeneity of variance have been violated wed use a nonparametric test anyway!
Task 3

Students (and lecturers for that matter) love their mobile phones, which is rather worrying given some recent controversy about links between mobile phone use and brain tumours. The basic idea is that mobile phones emit microwaves, and so holding one next to your brain for large parts of the day is a bit like sticking your brain in a microwave oven and selecting the cook until well done button. If we wanted to test this experimentally, we could get six groups of people and strap a mobile phone on their heads (that they cant remove). Then, by remote control, we turn the phones on for a certain amount of time each day. After six months, we measure the size of any tumour (in mm3) close to the site of the phone antennae (just behind the ear). The six groups experienced 0, 1, 2, 3, 4 or 5 hours per day of phone microwaves for six months. The data are in Tumour.sav. (From Field & Hole, 2003, so there is a very detailed answer in there.)

SPSS Output

109

The error bar chart of the mobile phone data shows the mean size of brain tumour in each condition, and the funny I shapes show the confidence interval of these means. Note that in the control group (0 hours), the mean size of the tumour is virtually zero (we wouldnt actually expect them to have a tumour) and the error bar shows that there was very little variance across samples. Well see later that this is problematic for the analysis.

Descriptives Size of Tumour (MM cubed) 95% Confidence Interval for Mean Lower Bound Upper Bound .0119 .0232 .3819 .6479 1.0310 1.4917 2.6633 3.3799 4.5619 5.2137 4.3648 5.0964 2.0393 2.7720

N 0 1 2 3 4 5 Total 20 20 20 20 20 20 120

Mean .0175 .5149 1.2614 3.0216 4.8878 4.7306 2.4056

Std. Deviation .01213 .28419 .49218 .76556 .69625 .78163 2.02662

Std. Error .00271 .06355 .11005 .17118 .15569 .17478 .18500

Minimum .00 .00 .48 1.77 3.04 2.70 .00

Maximum .04 .94 2.34 4.31 6.05 6.14 6.14

This output shows the table of descriptive statistics from the one-way ANOVA; were told the means, standard deviations and standard errors of the means for each experimental condition. The means should correspond to those plotted in the graph. These diagnostics are important for interpretation later on.

110

Test of Homogeneity of Variances Size of Tumour (MM cubed) Levene Statistic 10.245 df1 5 df2 114 Sig. .000

The next part of the output reports a test of this assumption, Levenes test. For these data, the assumption of homogeneity of variance has been violated, because our significance is 0.000, which is considerably smaller than the criterion of 0.05. In these situations, we have to try to correct the problem and we can either transform the data or choose the Welch F.
ANOVA Size of Tumour (MM cubed) Sum of Squares 450.664 38.094 488.758 df 5 114 119 Mean Square 90.133 .334 F 269.733 Sig. .000

Between Groups Within Groups Total

The main ANOVA summary table shows us that because the observed significance value is less than 0.05 we can say that there was a significant effect of mobile phones on the size of tumour. However, at this stage we still do not know exactly what the effect of the phones was (we dont know which groups differed).
Robust Tests of Equality of Means Size of Tumour (MM cubed) Welch Brown-Forsythe Statistic 414.926 269.733
a

df1 5 5

df2 44.390 75.104

Sig. .000 .000

a. Asymptotically F distributed.

This table shows the Welch and BrownForsythe Fs, which are useful because homogeneity of variance was violated. Luckily our conclusions remain the same: both Fs have significance values less than 0.05.

111

Multiple Comparisons Dependent Variable: Size of Tumour (MM cubed) Games-Howell Mean Difference (I-J) -.4973* -1.2438* -3.0040* -4.8702* -4.7130* .4973* -.7465* -2.5067* -4.3729* -4.2157* 1.2438* .7465* -1.7602* -3.6264* -3.4692* 3.0040* 2.5067* 1.7602* -1.8662* -1.7090* 4.8702* 4.3729* 3.6264* 1.8662* .1572 4.7130* 4.2157* 3.4692* 1.7090* -.1572

(I) Mobile Phone Use (Hours Per Day) 0

(J) Mobile Phone Use (Hours Per Day) 1 2 3 4 5 0 2 3 4 5 0 1 3 4 5 0 1 2 4 5 0 1 2 3 5 0 1 2 3 4

Std. Error .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280 .18280

Sig. .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .984 .000 .000 .000 .000 .984

95% Confidence Interval Lower Bound Upper Bound -.6982 -.2964 -1.5916 -.8960 -3.5450 -2.4631 -5.3622 -4.3783 -5.2653 -4.1608 .2964 .6982 -1.1327 -.3603 -3.0710 -1.9424 -4.8909 -3.8549 -4.7908 -3.6406 .8960 1.5916 .3603 1.1327 -2.3762 -1.1443 -4.2017 -3.0512 -4.0949 -2.8436 2.4631 3.5450 1.9424 3.0710 1.1443 2.3762 -2.5607 -1.1717 -2.4429 -.9751 4.3783 5.3622 3.8549 4.8909 3.0512 4.2017 1.1717 2.5607 -.5455 .8599 4.1608 5.2653 3.6406 4.7908 2.8436 4.0949 .9751 2.4429 -.8599 .5455

*. The mean difference is significant at the .05 level.

Because there were no specific hypotheses I just carried out post hoc tests and stuck to my favourite GamesHowell procedure (because variances were unequal). It is clear from the table that each group of participants is compared to all of the remaining groups. First, the control group (0 hours) is compared to the 1, 2, 3, 4 and 5 hour groups and reveals a significant difference in all cases (all the values in the column labeled Sig. are less than 0.05). In the next part of the table, the 1 hour group is compared to all other groups. Again all comparisons are significant (all the values in the column labeled Sig. are less than 0.05). In fact, all of the comparisons appear to be highly significant except the comparison between the 4 and 5 hour groups, which is non-significant because the value in the column labeled Sig. is bigger than 0.05. Calculating the Effect Size

112

The output provides us with three measures of variance: the between group effect (SSM), the within subject effect (MSR) and the total amount of variance in the data (SST). We can use these to calculate omega squared (2):

2 =

MSM MSR MSM + ( (n 1) MSR )

2 =

90.133 0.334 90.133 + ((120 1) 0.334) 89.799 = 129.879 = .69

= .83
Interpreting and Writing the Result We could report the main finding as: Levenes test indicated that the assumption of homogeneity of variance had been violated (F(5, 114) = 10.25, p < .001). Transforming the data did not rectify this problem and so F-tests are reported nevertheless. The results show that using a mobile phone significantly affected the size of brain tumour found in participants (F(5, 114) = 269.73, p < .001, 2 = .69). The effect size indicated that the effect of phone use on tumour size was substantial. The next thing that needs to be reported are the post hoc comparisons. It is customary just to summarize these tests in very general terms like this: GamesHowell post hoc tests revealed significant differences between all groups (p < .001 for all tests) except between 4 and 5 hours (ns).

113

If you do want to report the results for each post hoc test individually, then at least include the 95% confidence intervals for the test as these tell us more than just the significance value. In this example, though, when there are many tests it might be as well to summarize these confidence intervals as a table (see below): 95% Confidence Interval Mobile Phone Use Sig. (Hours Per Day) 0 1 2 3 4 5 1 2 3 4 5 2 3 4 5 3 4 5 4 5 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 < .001 = .984 Bound .6982 1.5916 Bound .2964 .8960 Lower Upper

3.5450 2.4631 5.3622 4.3783 5.2653 4.1608 1.1327 .3603

3.0710 1.9424 4.8909 3.8549 4.7908 3.6406 2.3762 1.1443 4.2017 3.0512 4.0949 2.8436 2.5607 1.1717 2.4429 .5455 .9751 .8599

114

Task 4

Using the Glastonbury data (GlastonburyFestival.sav), carry out a one-way ANOVA on the data to see if the change in hygiene (change) is significant across people with different musical tastes (music). Do a simple contrast to compare each group against No Affiliation. Compare the results to those described in section 7.11.

SPSS Output:

Levenes test is non-significant, showing that variances were roughly equal, F(3, 119) = 0.87, p > .05, across crusties, metallers, indie kids and people with no affiliation.

The above is the main ANOVA table. We could say that the change in hygiene scores was significantly different across the different musical groups, F(3, 119) = 3.27, p < .05. Compare this table to the one in section 7.11, in which we analysed these data as a regression:

115

Its exactly the same! This should, I hope, re-emphasize to you that regression and ANOVA are the same analytic system!
Task 5

Labcoat Leni's Real Research 15.2 describes an experiment on quails with fetishes for terrycloth objects (really, it does). In this example, you are asked to analyse two of the variables that the researchers measured with a KruskalWallis test. However, there were two other outcome variables (time spent near the terrycloth object and copulatory efficiency). These data can be analysed with oneway ANOVA. Read Labcoat Leni's Real Research 15.2 to get the full story, then carry out two one-way ANOVAs and Bonferroni post hoc tests on the aforementioned outcome variables.

Lets begin by using the Chart Builder (

) to do some error bar charts:

116

To conduct one-way ANOVA we have to access the main dialog box by selecting . This dialog box has a space in which you can list one or more dependent variables and a second space to specify a grouping variable, or factor. For these data we need to select Duration and Efficiency from the variables list and drag them to the box labelled Dependent List (or click on ). Then select the

grouping variable Group and drag it to the box labelled Factor (or click on ).
117

You were asked to do post hoc tests so we can skip the contrast options. Click on in the main dialog box to access the post hoc tests dialog box. You were asked to do a Bonferroni post hoc test so select this, but lets also select GamesHowell in case of problems in homogeneity (which of course we would have checked before running this main analysis!). Click on to return to the main dialog box.

Select to test for homogeneity of variance and also to obtain the BrownForsythe F and Welch F. Click on the analysis. to return to the main dialog box and then click on to run

118

The output should look like this:

This tells us that the homogeneity of variance assumption is met for both outcome variables. This means that we can ignore (just as the authors did) the corrected Fs and GamesHowell post hoc tests. Instead we can look at the normal Fs and Bonferroni post hoc tests (which is what the authors of this paper reported).

119

This table tells us that the group (fetishistic, non-fetishistic or control group) had a significant effect on the time spent near the terrycloth object, and the copulatory efficiency. To find out exactly whats going on we can look at our post hoc tests:

The authors reported as follows: A one-way ANOVA indicated significant group differences, F(2, 56) = 91.38, p < .05, = 0.76. Subsequent pairwise comparisons (with the Bonferroni correction) revealed that fetishistic male quail stayed near the CS longer than both the nonfetishistic male quail (mean difference = 10.59 s; 95% CI = 4.16, 17.02; p < .05) and the control male quail (mean difference = 29.74 s; 95% CI = 24.12, 35.35; p < .05). In addition, the nonfetishistic male quail spent more time near the CS than did the control male quail (mean difference = 19.15 s; 95% CI = 13.30, 24.99; p < .05). (pp.429430) Note that the CS is the terrycloth object. Look at the graph, the ANOVA table and the post hoc tests to see from where the values that they report come. For the copulatory efficiency outcome the authors reported as follows:

120

A one-way ANOVA yielded a significant main effect of groups, F(2, 56) = 6.04, p < .05, = 0.18. Paired comparisons (with the Bonferroni correction) indicated that the nonfetishistic male quail copulated with the live female quail (US) more efficiently than both the fetishistic male quail (mean difference = 6.61; 95% CI = 1.41, 11.82; p < .05) and the control male quail (mean difference = 5.83; 95% CI = 1.11, 10.56; p < .05). The difference between the efficiency scores of the fetishistic and the control male quail was not significant (mean difference = 0.78; 95% CI = 5.33, 3.77; p > .05). (p. 430) These results show that male quails do show fetishistic behaviour (the time spent with the terrycloth) and that this affects their copulatory efficiency (they are less efficient than those that dont develop a fetish, but its worth remembering that they are no worse than quails that had no sexual conditioning the controls). If you look at Labcoat Lenis box then youll also see that this fetishistic behaviour may have evolved because the quails with fetishistic behaviour manage to fertilize a greater percentage of eggs (so their genes are passed on!).
Chapter 11 Task 1

Stalking is a very disruptive and upsetting (for the person being stalked) experience in which someone (the stalker) constantly harasses or obsesses about another person. It can take many forms, from sending intensely disturbing letters threatening to boil your cat if you dont reciprocate the stalkers undeniable love for you, to literally following you around your local area in a desperate attempt to see which CD you buy on a Saturday (as if it would be anything other than

121

Fugazi!). A psychologist, whod had enough of being stalked by people, decided to try two different therapies on different groups of stalkers (25 stalkers in each groupthis variable is called Group). The first group of stalkers he gave what he termed cruel to be kind therapy. This therapy was based on punishment for stalking behaviours; in short, every time the stalker followed him around, or sent him a letter, the psychologist attacked them with a cattle prod until they stopped their stalking behaviour. It was hoped that the stalker would learn an aversive reaction to anything resembling stalking. The second therapy was

psychodyshamic therapy, which was a recent development on Freuds psychodynamic therapy that acknowledges what a sham this kind of treatment is (so, you could say its based on Fraudian theory!). The stalkers were hypnotized and regressed into their childhood, the therapist would also discuss their penis (unless it was a woman in which case they discussed their lack of penis), the penis of their father, their dogs penis, the penis of the cat down the road, and anyone elses penis that sprang to mind. At the end of therapy, the psychologist measured the number of hours in the week that the stalker spent stalking their prey (this variable is called stalk2). Now, the therapist believed that the success of therapy might well depend on how bad the problem was to begin with, so before therapy the therapist measured the number of hours that the patient spent stalking as an indicator of how much of a stalker the person was (this variable is called stalk1). The data are in the file Stalker.sav. Analyse the effect of therapy on stalking behaviour after therapy, controlling for the amount of stalking behaviour before therapy.

122

SPSS Output
Tests of Between-Subjects Effects Dependent Variable: Time Spent Stalking After Therapy (hours per week) Source Corrected Model Intercept THERAPY Error Total Corrected Total Type III Sum of Squares 591.680a 170528.000 591.680 8526.320 179646.000 9118.000 df 1 1 1 48 50 49 Mean Square 591.680 170528.000 591.680 177.632 F 3.331 960.009 3.331 Sig. .074 .000 .074

a. R Squared = .065 (Adjusted R Squared = .045)

This output shows the ANOVA table when the covariate is not included. It is clear from the significance value that there is no difference in the hours spent stalking after therapy for the two therapy groups (p is 0.074, which is greater than 0.05). You should note that the total amount of variation to be explained (SST) was 9118, of which the experimental manipulation accounted for 591.68 units (SSM), while 8526.32 were unexplained (SSR).

70

Mean Hours Spent Stalking After Therapy

Cruel to be Kind Therapy Psychodyshamic

60

50

0 Unadjusted Adjusted

Type of Mean

This bar chart shows the mean number of hours spent stalking after therapy. The normal means are shown as well as the same means when the data are adjusted for the effect of the covariate. In this case the adjusted and unadjusted means are relatively similar.

123

Descriptive Statistics Dependent Variable: Time Spent Stalking After Therapy (hours per week) Group Cruel to be Kind Therapy Psychodyshamic Therapy Total Mean 54.9600 61.8400 58.4000 Std. Deviation 16.33116 9.41046 13.64117 N 25 25 50

This table shows the unadjusted means (i.e. the normal means if we ignore the effect of the covariate). These are the same values plotted on the left-hand side of the bar chart. These results show that the time spent stalking after therapy was less after cruel to be kind therapy. However, we know from our initial ANOVA that this difference is nonsignificant. So, what now happens when we consider the effect of the covariate (in this case the extent of the stalkers problem before therapy)?
a Levene's Test of Equality of Error Variances

Dependent Variable: Time Spent Stalking After Therapy (hours per week) F 7.189 df1 1 df2 48 Sig. .010

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+STALK1+GROUP

This table shows the results of Levenes test, which is significant because the significance value is 0.01 (less than 0.05). This finding tells us that the variances across groups are different and the assumption has been broken.
Tests of Between-Subjects Effects Dependent Variable: Time Spent Stalking After Therapy (hours per week) Source Corrected Model Intercept HOURS SPENT STALKING BEFORE THERAPY THERAPY Error Total Corrected Total a. R Squared = .549 (Adjusted R Squared = .530) Type III Sum of Squares 5006.278a 8.646E-02 4414.598 480.265 4111.722 179646.000 9118.000 df 2 1 1 1 47 50 49 Mean Square 2503.139 8.646E-02 4414.598 480.265 87.483 F 28.613 .001 50.462 5.490 Sig. .000 .975 .000 .023

This table shows the ANCOVA. Looking first at the significance values, it is clear that the covariate significantly predicts the dependent variable, so the hours spent stalking after therapy depend on the extent of the initial problem (i.e. the hours spent stalking
124

before therapy). More interesting is that when the effect of initial stalking behaviour is removed, the effect of therapy becomes significant (p has gone down from 0.074 to 0.023, which is less than 0.05).
Group Dependent Variable: Time Spent Stalking After Therapy (hours per week) Group Cruel to be Kind Therapy Psychodyshamic Therapy Mean Std. Error 55.299a 1.871 a 61.501 1.871 95% Confidence Interval Lower Bound Upper Bound 51.534 59.063 57.737 65.266

a. Evaluated at covariates appeared in the model: Time Spent Stalking Before Therapy (hours per week) = 65.2200.

To interpret the results of the main effect of therapy we need to look at adjusted means. These adjusted means are shown above. There are only two groups being compared in this example so we can conclude that the therapies had a significantly different effect on stalking behaviour; specifically, stalking behaviour was lower after the therapy involving the cattle prod compared to psychodyshamic therapy.

125

Linear Regression

Stalking After Therapy (hours per week)

80.00

60.00

40.00

20.00

50.00

60.00

70.00

80.00

90.00

Stalking Before Therapy (hours per week)

We need to interpret the covariate. The graph above shows the time spent stalking after therapy (dependent variable) and the initial level of stalking (covariate). This graph shows that there is a positive relationship between the two variables: that is, high scores on one variable correspond to high scores on the other, whereas low scores on one variable correspond to low scores on the other. Calculating the Effect Size The value of 2 can be calculated for the effect of therapy using the sum of squares for the experimental effect (480.27), the mean squares for the error term (87.48) and the total variability (the corrected total 9118):

126

2 =

SSM (dfM )MSR SST + MSR

2 =

480.265 (1)87.483 9118 + 87.483 392.782 = 9205.483 = .04 = .21

This represents a medium to large effect. Therefore, the effect of a cattle prod compared to psychodyshamic therapy is a substantive finding. For the effect of the covariate, the error mean squares is the same, but the effect is much bigger (MSM is 4414.60 rounded to 2 decimal places). If we place this value in the equation, we get the following:

2 =

SSM (dfM )MSR SST + MSR

4414.598 (1)87.483 9118 + 87.483 4327.115 = 9205.483 = .47 covariate = .69


2 = covariate

This represents a very large effect (it is well above the threshold of 0.5, and is close to 1). Therefore, the relationship between initial stalking behaviour and the stalking behaviour after therapy is very strong indeed. Interpreting and Writing the Result

127

The correct way to report the main finding would be: Levenes test was significant (F(1, 48) = 7.19, p < .05) indicating that the assumption of homogeneity of variance had been broken. The main effect of therapy was significant (F(1, 47) = 5.49, p < .05, 2 = .04) indicating that the time spent stalking was lower after using a cattle prod (M = 55.30, SE = 1.87) compared to after psychodyshamic therapy (M = 61.50, SE = 1.87). The covariate was also significant (F(1, 47) = 50.46, p < .001, 2 = .47) indicating that level of stalking before therapy had a significant effect on level of stalking after therapy (there was a positive relationship between these two variables). All significant values are reported at p < .05. There was a significant effect of teaching style on exam marks, F(2, 27) = 21.01, = .82. Planned contrasts revealed that reward produced significantly better exam grades than punishment and indifference, t(27) = 5.98, r = .75, and that punishment produced significantly worse exam marks than indifference, t(27) = 2.51, r = .43.
Task 2

A marketing manager for a certain well-known drinks manufacturer was interested in the therapeutic benefit of certain soft drinks for curing hangovers. He took 15 people out on the town one night and got them drunk. The next morning as they awoke, dehydrated and feeling as though theyd licked a camels sandy feet clean with their tongue, he gave five of them water to drink, five of them Lucozade (in case this isnt sold outside of the UK, its a very nice glucose-based drink) and the remaining five a leading brand of cola (this variable is called
drink). He then measured how well they felt (on a scale from 0 = I feel like death
128

to 10 = I feel really full of beans and healthy) two hours later (this variable is called well). He wanted to know which drink produced the greatest level of wellness. However, he realized that it was important to control for how drunk the person got the night before, and so he measured this on a scale of 0 = as sober as a nun to 10 = flapping about like a haddock out of water on the floor in a puddle of their own vomit. The data are in the file HangoverCure.sav. SPSS Output
Tests of Between-Subjects Effects Dependent Variable: How Well Does The Person Feel? Source Corrected Model Intercept DRINK Error Total Corrected Total Type III Sum of Squares 2.133a 459.267 2.133 15.600 477.000 17.733 df 2 1 2 12 15 14 Mean Square 1.067 459.267 1.067 1.300 F .821 353.282 .821 Sig. .463 .000 .463

a. R Squared = .120 (Adjusted R Squared = -.026)

This table shows the ANOVA table for these data when the covariate is not included. It is clear from the significance value that there are no differences in how well people feel when they have different drinks.
a Levene's Test of Equality of Error Variances

Dependent Variable: How Well Does The Person Feel? F .220 df1 2 df2 12 Sig. .806

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+DRUNK+DRINK

129

Tests of Between-Subjects Effects Dependent Variable: How Well Does The Person Feel? Source Corrected Model Intercept DRUNK DRINK Error Total Corrected Total Type III Sum of Squares 13.320a 14.264 11.187 3.464 4.413 477.000 17.733 df 3 1 1 2 11 15 14 Mean Square 4.440 14.264 11.187 1.732 .401 F 11.068 35.556 27.886 4.318 Sig. .001 .000 .000 .041

a. R Squared = .751 (Adjusted R Squared = .683)

These tables show the results of Levenes test and the ANOVA table when drunkenness the previous night is included in the model as a covariate. Levenes test is nonsignificant, indicating that the group variances are roughly equal (hence the assumption of homogeneity of variance has been met). It is clear that the covariate significantly predicts the dependent variable, so the drunkenness of the person influenced how well they felt the next day. Whats more interesting is that when the effect of drunkenness is removed, the effect of drink becomes significant (p is 0.041, which is less than 0.05).
Parameter Estimates Dependent Variable: How Well Does The Person Feel? Parameter Intercept DRUNK [DRINK=1.00] [DRINK=2.00] [DRINK=3.00] B Std. Error 7.116 .377 -.548 .104 -.142 .420 .987 .442 0a . t 18.861 -5.281 -.338 2.233 . Sig. .000 .000 .741 .047 . 95% Confidence Interval Lower Bound Upper Bound 6.286 7.947 -.777 -.320 -1.065 .781 .014 1.960 . .

a. This parameter is set to zero because it is redundant.

The next table shows the parameter estimates selected in the options dialog box. These estimates are calculated using a regression analysis with drink split into two dummy coding variables. SPSS codes the two dummy variables such that the last category (the category coded with the highest value in the data editor, in this case the cola group) is the reference category. This reference category (labelled dose=3 in the output) is coded with 0 for both dummy variables; dose=2, therefore, represents the difference between the

130

group coded as 2 (Lucozade) and the reference category (cola); and dose=1 represents the difference between the group coded as 1 (water) and the reference category (cola). The beta values literally represent the differences between the means of these groups and so the significances of the t-tests tell us whether the group means differ significantly. Therefore, from these estimates we could conclude that the cola and water groups have similar means whereas the cola and Lucozade groups have significantly different means.
Contrast Results (K Matrix) Dependent Variable How Well Does The Person Feel? 1.129 0 1.129 .405 .018 .237 2.021 .142 0 .142 .420 .741 -.781 1.065

Drink Simple Contrast Level 2 vs. Level 1

Contrast Estimate Hypothesized Value Difference (Estimate - Hypothesized) Std. Error Sig. 95% Confidence Interval for Difference

Lower Bound Upper Bound

Level 3 vs. Level 1

Contrast Estimate Hypothesized Value Difference (Estimate - Hypothesized) Std. Error Sig. 95% Confidence Interval for Difference

Lower Bound Upper Bound

a. Reference category = 1

The next output shows the result of a contrast analysis that compares level 2 (Lucozade) against level 1 (water) as a first comparison, and level 3 (cola) against level 1 (water) as a second comparison. These results show that the Lucozade group felt significantly better than the water group (contrast 1), but that the cola group did not differ significantly from the water group (p = 0.741). These results are consistent with the regression parameter estimates (in fact, note that contrast 2 is identical to the regression parameters for dose=1 in the previous section).

131

Drink Dependent Variable: How Well Does The Person Feel? Drink Water Lucozade Cola Mean Std. Error 5.110a .284 6.239a .295 5.252a .302 95% Confidence Interval Lower Bound Upper Bound 4.485 5.735 5.589 6.888 4.588 5.916

a. Covariates appearing in the model are evaluated at the following values: How Drunk was the Person the Night Before = 4.6000.

This table gives the adjusted values of the group means and it is these values that should be used for interpretation. The adjusted means show that the significant ANCOVA reflects a difference between the water and the Lucozade groups. The cola and water groups appear to have fairly similar adjusted means indicating that cola is no better than water at helping your hangover. These conclusions support what we know from the contrasts and regression parameters. To look at the effect of the covariate we can examine a scatterplot:

This shows that the more drunk a person was the night before, the less well they felt the next day.

Calculating the Effect Size


132

We can calculate (2) for the covariate:

2 =

SSM (dfM )MSR SST + MSR

2 =

11.187 (1).401 17.733 + .401 10.786 = 18.134 = .59 = .77

We can also do the same for the main effect of drink:

2 =

3.464 (1).401 17.733 + .401 3.063 = 18.134 = .17 = .41

Weve got t-statistics for the comparisons between the cola and water groups and the cola and Lucozade groups. These t-statistics have N2 degrees of freedom, where N is the total sample size (in this case 15). Therefore we get:

rCola vs. Water

0.338 2 = 0.338 2 + 13 = 0.09 2.233 2 = 2.233 2 + 13 = 0.53

rCola vs. Lucozade

Interpreting and Writing the Result

133

We could report the main finding as: The covariate, drunkenness, was significantly related to how ill the person felt the next day, F(1, 11) = 27.89, p < .001, 2 = .59. There was also a significant effect of the type of drink on how well the person felt after controlling for how drunk they were the night before, F(2, 11) = 4.32, p < 0.05, 2 = .17. We can also report some contrasts: Planned contrasts revealed that having Lucozade significantly improved how well you felt compared to having cola, t(13) = 2.23, p < .05, r = .53, but having cola was no better than having water, t(13) = 0.34, ns, r = .09. We can conclude that cola and water have the same effects on hangovers but that Lucozade seems significantly better at curing hangovers than cola.
Chapter 12 Task 1

Peoples musical taste tends to change as they get older (my parents, for example, after years of listening to relatively cool music when I was a kid in the 1970s, subsequently hit their mid-fourties and developed a worrying obsession with country and western musicor maybe it was the stress of having me as a teenage son!). Anyway, this worries me immensely as the future seems incredibly bleak if it is spent listening to Garth Brooks and thinking oh boy, did I underestimate Garths immense talent when I was in my 20s. So, I thought Id do some research to find out whether my fate really was sealed, or whether its possible to be old and like good music too. First, I got myself two groups of people (45 people in
134

each group): one group contained young people (which I arbitrarily decided was under 40 years of age), and the other group contained more mature individuals (above 40 years of age). This is my first independent variable, age, and it has two levels (less than or more than 40 years old). I then split each of these groups of 45 into three smaller groups of 15 and assigned them to listen to either Fugazi (who everyone knows are the coolest band on the planet), ABBA or Barf Grooks (who is a lesser known country and western musician not to be confused with anyone who has a similar name and produces music that makes you want to barf). This is my second independent variable, music, and has three levels (Fugazi, ABBA or Barf Grooks). There were different participants in all conditions, which means that of the 45 under fourties, 15 listened to Fugazi, 15 listened to ABBA and 15 listened to Barf Grooks; likewise of the 45 over fourties, 15 listened to Fugazi, 15 listened to ABBA and 15 listened to Barf Grooks. After listening to the music I got each person to rate it on a scale ranging from 100 (I hate this foul music of Satan) through 0 (I am completely indifferent) to +100 (I love this music so much Im going to explode). This variable is called liking. The data are in the file
Fugazi.sav. Conduct a two-way independent ANOVA on them.

SPSS Output The error bar chart of the music data shows the mean rating of the music played to each group. Its clear from this chart that when people listened to Fugazi the two age groups were divided: the older ages rated it very low, but the younger people rated it very highly. A reverse trend is found if you look at the ratings for Barf Grooks: the youngsters give it

135

low ratings while the wrinkly ones love it. For ABBA the groups agreed: both old and young rated them highly.

100.00 80.00 60.00

Age Group
40+ 0-40

Mean Liking Rating

40.00 20.00 0.00 -20.00 -40.00 -60.00 -80.00 -100.00 Fugazi Abba Barf Grooks

Music

The following output shows Levenes test. For these data the significance value is 0.322, which is greater than the criterion of 0.05. This means that the variances in the different experimental groups are roughly equal (i.e. not significantly different), and that the assumption has been met.
a Levene's Test of Equality of Error Variances

Dependent Variable: Liking Rating F 1.189 df1 5 df2 84 Sig. .322

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+MUSIC+AGE+MUSIC * AGE

The next output shows the main ANOVA summary table.

136

Tests of Between-Subjects Effects Dependent Variable: Liking Rating Source Corrected Model Intercept MUSIC AGE MUSIC * AGE Error Total Corrected Total Type III Sum of Squares 392654.933a 34339.600 81864.067 .711 310790.156 32553.467 459548.000 425208.400 df 5 1 2 1 2 84 90 89 Mean Square 78530.987 34339.600 40932.033 .711 155395.078 387.541 F 202.639 88.609 105.620 .002 400.977 Sig. .000 .000 .000 .966 .000

a. R Squared = .923 (Adjusted R Squared = .919)

The main effect of music is shown by the F-ratio in the row labelled MUSIC; in this case the significance is 0.000, which is lower than the usual cut-off point of 0.05. Hence, we can say that there was a significant effect of the type of music on the ratings. To understand what this actually means, we need to look at the mean ratings for each type of music when we ignore whether the person giving the rating was old or young:

Error Bars show 95.0% Cl of Mean Bars show Means


75.00

Liking Rating

50.00

25.00

0.00

-25.00

Fugazi

Abba

Barf Grooks

Music

137

What this graph shows is that the significant main effect of music is likely to reflect the fact that ABBA were rated (overall) much more positively than the other two artists. The main effect of age is shown by the F-ratio in the row labelled AGE; the probability associated with this F-ratio is 0.966, which is so close to 1 that it means that it is a virtual certainty that this F could occur by chance alone. Again, to interpret the effect we need to look at the mean ratings for the two age groups ignoring the type of music to which they listened.

50.00

Error Bars show 95.0% Cl of Mean Bars show Means

40.00

Liking Rating

30.00

20.00

10.00

0.00

-10.00

40+

0-40

Age Group

This graph shows that when you ignore the type of music that was being rated, older people, on average, gave almost identical ratings to younger people (i.e. the mean ratings in the two groups are virtually the same). The interaction effect is shown by the F-ratio in the row labeled MUSIC * AGE; the associated significance value is small (0.000) and is less than the criterion of 0.05. Therefore, we can say that there is a significant interaction between age and the type of

138

music rated. To interpret this effect we need to look at the mean ratings in all conditions and these means were originally plotted at the beginning of this output. The fact there is a significant interaction tells us that for certain types of music the different age groups gave different ratings. In this case, although they agree on ABBA, there are large disagreements in ratings of Fugazi and Barf Grooks. Given that we found a main effect of music, and of the interaction between music and age, we can look at some of the post hoc tests to establish where the difference lies. The next output shows the result of GamesHowell post hoc tests. First, ratings of Fugazi are compared to ABBA, which reveals a significant difference (the value in the column labeled Sig. is less than 0.05), and then Barf Grooks, which reveals no difference (the significance value is greater than 0.05). In the next part of the table, ratings to ABBA are compared first to Fugazi (which just repeats the finding in the previous part of the table) and then to Barf Grooks, which reveals a significant difference (the significance value is below 0.05). The final part of the table compares Barf Grooks to Fugazi and ABBA but these results repeat findings from the previous sections of the table.
Multiple Comparisons Dependent Variable: Liking Rating Mean Difference (I-J) Std. Error -66.8667* 5.08292 -6.2333 5.08292 66.8667* 5.08292 60.6333* 5.08292 6.2333 5.08292 -60.6333* 5.08292

Games-Howell

(I) Music Fugazi Abba Barf Grooks

(J) Music Abba Barf Grooks Fugazi Barf Grooks Fugazi Abba

Sig. .000 .946 .000 .001 .946 .001

95% Confidence Interval Lower Bound Upper Bound -101.1477 -32.5857 -53.3343 40.8677 32.5857 101.1477 24.9547 96.3119 -40.8677 53.3343 -96.3119 -24.9547

Based on observed means. *. The mean difference is significant at the .05 level.

Calculating Effect Sizes

139

2 =

= 900.99 15 3 2 (2 1)(0.711 387.541 ) 2 = = 4.30 15 3 2 (3 1)(2 1)(155395 .078 387.541 ) 2 = = 3444 .61 15 3 2

(3 1)(40932 .033 387.541 )

We also need to estimate the total variability and this is just the sum of these other variables plus the residual mean squares:
2 2 2 2 total = + + + MSR

= 900.99 4.30 + 3444.61 + 387.54 = 4728.84

The effect size is then simply the variance estimate for the effect in which youre interested divided by the total variance estimate:

2 effect

2 effect = 2 total

As such, for the main effect of music we get:

2 music

2 music 900.99 = 2 = = 0.19 total 4728 .84

For the main effect of age we get:


2 age 2 total

2 age

4.30 = 0.00 4728 .84

140

For the interaction of music and age we get:

2 music age =

2 music age 2 total

3444 .61 = 0.73 4728 .84

Interpreting and Writing the Result

As with the other ANOVAs weve encountered we have to report the details of the Fratio and the degrees of freedom from which it was calculated. For the various effects in these data the F-ratios will be based on different degrees of freedom: it was derived from dividing the mean squares for the effect by the mean squares for the residual. For the effects of music and the music age interaction, the model degrees of freedom were 2 (dfM = 2), but for the effect of age the degrees of freedom were only 1 (dfM = 1). For all effects, the degrees of freedom for the residuals were 84 (dfR = 84). We can, therefore, report the three effects from this analysis as follows: The results show that the main effect of the type of music listened to significantly affected the ratings of that music (F(2, 84) = 105.62, p < .001, r = .94). The GamesHowell post hoc test revealed that ABBA were rated significantly higher than both Fugazi and Barf Grooks (both ps < .01). The main effect of age on the ratings of the music was non-significant (F(1, 84) < 1, r = .00). The music age interaction was significant (F(2, 84) = 400.98, p < .001, r = .98) indicating that different types of music were rated differently by the two age groups. Specifically, Fugazi were rated more positively by the young group (M =

141

66.20, SD = 19.90) than the old (M = 75.87, SD = 14.37); ABBA were rated fairly equally in the young (M = 64.13, SD = 16.99) and old groups (M = 59.93, SD = 19.98); Barf Grooks was rated less positively by the young group (M = 71.47, SD = 23.17) compared to the old (M = 74.27, SD = 22.29). These findings indicate that there is no hope for me the minute I hit 40 I will suddenly start to love country and western music and will burn all of my Fugazi CDs (it will never happen arghhhh!!!).
Task 2

In Chapter 3 we used some data that related to men and womens arousal levels when watching either Bridget Jones Diary or Memento (ChickFlick.sav). Analyse these data to see whether men and women differ in their reactions to different types of films.

The following output shows Levenes test. For these data the significance value is 0.456, which is greater than the criterion of 0.05. This means that the variances in the different experimental groups are roughly equal (i.e. not significantly different), and that the assumption has been met.

The next output shows the main ANOVA summary table.

142

The main effect of gender is shown by the F-ratio in the row labelled gender; in this case the significance is 0.153, which is greater than the usual cut-off point of 0.05. Hence, we can say that there was not a significant effect of gender on arousal during the films. To understand what this actually means, we need to look at the mean arousal levels for men and women (when we ignore which film they watched):

143

What this graph shows is that arousal levels were quite similar for men and women in general; this is why the main effect of gender was non-significant. The main effect of film is shown by the F-ratio in the row labelled film; the probability associated with this F-ratio is 0.000, which is less than the critical value of 0.05, hence we can say that arousal levels were significantly different in the two films. Again, to interpret the effect we need to look at the mean arousal levels but this time comparing the two films (and ignoring whether the person was male or female). This graph shows that when you ignore the gender of the person, arousal levels were significantly higher for Memento than Bridget Jones Diary.

144

The interaction effect is shown by the F-ratio in the row labelled gender * film; the associated significance value is 0.366, which is greater than the criterion of 0.05. Therefore, we can say that there is not a significant interaction between gender and the type of film watched. To interpret this effect we need to look at the mean arousal in all conditions.

145

This graph shows the non-significant interaction: arousal levels are higher for Memento compared to Bridget Jones Diary in both men and women (i.e. the difference between the green and blue bars is more or less the same for men and women).

Calculating Effect Sizes

146

2 =

10 2 2 (2 1)(1092.03 40.77 ) = 1091.01 2 = 10 2 2 (2 1)(2 1)(34.23 40.77 ) = 0.16 2 = 10 2 2

(2 1)(87.03 40.77 ) = 1.16

We also need to estimate the total variability and this is just the sum of these other variables plus the residual mean squares:
2 2 2 2 total = + + + MSR

= 1.16 + 1091.01 0.16 + 40.77 = 1132.78


The effect size is then simply the variance estimate for the effect in which youre interested divided by the total variance estimate:
2 effect = 2 total

2 effect

As such, for the main effect of gender we get:


2 Gender = 2 Gender 1.16 = = 0.01 2 total 1132 .78

For the main effect of film we get:


2 Film 1091 .01 = 2 = = 0.96 total 1132 .78

For the interaction we get:

2 Film

147

2 Gender Film

2 Gender 0.16 Film = = = 0.00 2 total 1132.78

Interpreting and Writing the Result We can report the three effects from this analysis as follows: The results show that the main effect of the type of film significantly affected arousal during that film, F(1, 36) = 26.79, p < .001, 2 = .96. Arousal levels were significantly higher during Memento compared to Bridget Jones Diary. The main effect of gender on arousal levels during the films was non-significant, F(1, 84) = 2.14, 2 = .01. The gender film interaction was not significant, F(1, 36) < 1, 2 = .00. This showed that arousal levels were higher for Memento compared to Bridget Jones Diary in both men and women.

Task 3

At the start of this chapter I described a way of empirically researching whether I wrote better songs than my old band mate Malcolm, and whether this depended on the type of song (a symphony or song about flies). The outcome variable would be the number of screams elicited by audience members during the songs. These data are in the file Escape From Inside.sav. Draw an error bar graph (lines), analyse and interpret these data.

148

To do a multiple line chart for means that are independent (i.e. have come from different groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Screams from the variable list and drag it into
Song_Type from the variable list and drag it into Songwriter variable and drag it into

; select ; finally select the

. This will mean that lines representing

Andys and Malcolms songs will be displayed in different colours. Select error bars in the properties dialog box and click on to produce the graph. to apply them to the Chart Builder. Click on

The resulting graph looks like this:

149

The following output shows Levenes test. For these data the significance value is 0.817, which is greater than the criterion of 0.05. This means that the variances in the different experimental groups are roughly equal (i.e. not significantly different), and that the assumption has been met.

150

The next output shows the main ANOVA summary table. The main effect of the type of song is shown by the F-ratio in the row labelled Song_Type; in this case the significance is 0.000, which is smaller than the usual cut-off point of 0.05. Hence, we can say that there was a significant effect of the type of song on the number of screams elicited while it was played. The graph shows that the two symphonies elicited significantly more screams of agony than the two songs about flies.

151

The main effect of the songwriter was significant because the significance of the F-ratio for this effect is 0.002, which is less than the critical value of 0.05, hence we can say that Andy and Malcolm differed in the reactions to their songs. The graph tells us that Andys songs elicited significantly more screams of torment from the audience than Malcolms songs.

The interaction effect was significant too because the associated significance value is 0.28, which is less than the criterion of 0.05. Therefore, we can say that there is a significant interaction between the type of song and who wrote it on peoples appreciation of the song. The line graph that you drew earlier on tells us that although reactions to Malcolms and Andys were fairly similar for the Flies song, they differed quite a bit for the symphony: Andys symphony elicited more screams of torment than

152

Malcolms. We can conclude that in general Malcolm was a better songwriter than Andy, but the interaction tells us that this effect is true mainly for symphonies. Calculating Effect Sizes

17 2 2 (2 1)(35.31 3.55) = 0.47 2 = 17 2 2 (2 1)(2 1)(18.02 3.77 ) = 0.21 2 = 17 2 2


We also need to estimate the total variability and this is just the sum of these other variables plus the residual mean squares:
2 2 2 2 total = + + + MSR

2 =

(2 1)(74.13 3.55) = 1.04

= 1.04 + 0.47 + 0.21+ 3.77 = 5.49


The effect size is then simply the variance estimate for the effect in which youre interested divided by the total variance estimate:
2 effect = 2 total

2 effect

As such, for the main effect of song type we get:


2 Type 1.04 of Song = = = 0.19 2 total 5.49

2 Type of Song

For the main effect of songwriter we get:

153

2 Songwriter

2 Songwriter 0.47 = = = 0.09 2 total 5.49

For the interaction we get:


2 Type 0.21 of Song Songwriter = = = 0.04 2 total 5.49

2 Type of Song Somgwriter

Interpreting and Writing the Result We can report the three effects from this analysis as follows: The results show that the main effect of the type of song significantly affected screams elicited during that song, F(1, 64) = 20.87, p < .001, 2 = .19; the two symphonies elicited significantly more screams of agony than the two songs about flies. The main effect of the songwriter significantly affected screams elicited during that song, F(1, 64) = 9.94, p < .001, 2 = .09; Andys songs elicited significantly more screams of torment from the audience than Malcolms songs. The song type songwriter interaction was significant, F(1, 64) = 5.07, p < .05, 2 = .04. Although reactions to Malcolms and Andys were fairly similar for the Flies song, they differed quite a bit for the symphony: Andys symphony elicited more screams of torment than Malcolms.

Task 4
154

Change the syntax in GogglesSimpleEffects.sps to look at the effect of alcohol at different levels of gender.

The correct syntax to use is: MANOVA Attractiveness BY gender (0 1) alcohol(1 3) /DESIGN = alcohol WITHIN gender(1) alcohol WITHIN gender (2) /PRINT CELLINFO SIGNIF( UNIV MULT AVERF HF GG ).

The main part of the analysis is:

* * * * * * A n a l y s i s

o f

V a r i a n c e -- design

1 * * * * * *

Tests of Significance for ATTRACT using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL ALCOHOL WITHIN GENDE R(1) ALCOHOL WITHIN GENDE R(2) (Model) (Total) R-Squared = Adjusted R-Squared = 3656.25 5208.33 102.08 43 2 2 85.03 2604.17 51.04

30.63 .60

.000 .553

5310.42 8966.67 .592 .554

4 47

1327.60 190.78

15.61

.000

What this shows is a significant effect of alcohol at level 1 of gender. Because we coded gender as 0 = male, 1 = female, this means theres a significant effect of alcohol for men. Think back to the chapter and this reflects the fact that men choose very unattractive

155

dates after 4 pints. However, there is no significant effect of alcohol at level 2 of gender. This tells us that women are not affected by the beergoggles effect: the attractiveness of their dates does not chance as they drink more. Calculating the Effect Size These effects have df-2 in the model so we cant calculate an effect size (well, technically we can calculate (2) but Im not entirely sure how useful that is).
Chapter 13 Task 1

There is often concern among students as to the consistency of marking between lecturers. It is common that lecturers obtain reputations for being hard or light markers (or to use the students terminology, evil manifestations from Beelzebubs bowels and nice people) but there is often little to substantiate these reputations. A group of students investigated the consistency of marking by submitting the same essays to four different lecturers. The mark given by each lecturer was recorded for each of the eight essays. It was important that the same essays were used for all lecturers because this eliminated any individual differences in the standard of work that each lecturer marked. This design is repeated measures because every lecturer marked every essay. The independent variable was the lecturer who marked the report and the dependent variable was the percentage mark given. The data are in the file Tutor.sav. Conduct a one-way ANOVA on these data by hand.

Data for essay marks example:


156

Tutor 1 Essay (Dr Field)

Tutor 2 (Dr Smith)

Tutor 3 (Dr Scrote)

Tutor 4 Mean (Dr Death) S2

1 2 3 4 5 6 7 8
Mean

62 63 65 68 69 71 78 75
68.875

58 60 61 64 65 67 66 73
64.25

63 68 72 58 54 65 67 75
65.25

64 65 65 61 59 50 50 45
57.375

61.75 64.00 65.75 62.75 61.75 63.25 65.25 67.00

6.92 11.33 20.92 18.25 43.58 84.25 132.92 216.00

There were 8 essays, each marked by four different lecturers. Their marks are shown in the table. In addition, the mean mark given by each lecturer is shown in the table, and also the mean mark that each essay received and the variance of marks for a particular essay. Now, the total variance within essays will in part be caused by the fact that different lecturers are harder or softer markers (the manipulation), and will, in part, be caused by the fact that the essays themselves will differ in quality (individual differences). The Total Sum of Squares (SST) Remember from one-way independent ANOVA that SST is calculated using the following equation:

157

2 SS T = sgrand ( N 1)

Well, in repeated-measures designs the total sum of squares is calculated in exactly the same way. The grand variance in the equation is simply the variance of all scores when we ignore the group to which they belong. So if we treated the data as one big group it would look as follows:

62 63 65 68 69 71 78 75

58 60 61 64 65 67 66 73

63 68 72 58 54 65 67 75

64 65 65 61 59 50 50 45

Grand Mean = 63.9375

The variance of these scores is 55.028 (try this on your calculators). We used 32 scores to generate this value, and so N is 32. As such the equation becomes:

158

2 SS T = sgrand ( N 1)

= 55.028 ( 32 1) = 1705.868
The degrees of freedom for this sum of squares, as with the independent ANOVA will be N1, or 31. The Within-Participant (SSW) The crucial variation in this design is that there is a variance component called the within-participant variance (this arises because weve manipulated our independent variable within each participant). This is calculated using a sum of squares. Generally speaking, when we calculate any sum of squares we look at the squared difference between the mean and individual scores. This can be expressed in terms of the variance across a number of scores and the number of scores on which the variance is based. For example, when we calculated the residual sum of squares in independent ANOVA (SSR) we used the following equation:

SSR = ( xi x i ) SSR = s 2 (n 1)

This equation gave us the variance between individuals within a particular group, and so is an estimate of individual differences within a particular group. Therefore, to get the total value of individual differences we have to calculate the sum of squares within each group and then add them up:

2 2 2 SS R = sgroup1 (n1 1) + sgroup2 (n2 1) + sgroup3 (n3 1)


159

This is all well and good when we have different people in each group, but in repeatedmeasures designs weve subjected people to more than one experimental condition, and therefore were interested in the variation not within a group of people (as in independent ANOVA) but within an actual person. That is, how much variability is there within an individual? To find this out we actually use the same equation but we adapt it to look at people rather than groups. So, if we call this sum of squares SSW (for within-participant SS) we could write it as:
2 2 2 2 SS W = sperson1 (n1 1) + sperson2 (n2 1) + sperson3 (n3 1)K + sperson n (nn 1)

This equation simply means that were looking at the variation in an individuals scores and then adding these variances for all the people in the study. Some of you may have noticed that, in our example, were using essays rather than people, and so to be pedantic wed write this as:
2 2 2 2 SSW = sessay1 ( n1 1) + sessay2 ( n2 1) + sessay3 ( n3 1) + K + sessayn ( nn 1)

The ns simply represent the number of scores on which the variances are based (i.e. the number of experimental conditions, or in this case the number of lecturers). All of the variances we need are in the table, so we can calculate SSW as:
2 2 2 2 SSW = sessay1 (n1 1) + sessay2 (n2 1) + sessay3 (n3 1) + K + sessay n ( nn 1)

= (6.92)(4 1) + (11.33)(4 1) + (20.92)(4 1) + (18.25)(4 1)] + (43.58)(4 1) + (84.25)(4 1) + (132.92)(4 1) + (216)(4 1) = 20.76 + 34 + 62.75 + 54.75 + 130.75 + 252.75 + 398.75 + 648 = 1602.5

160

The degrees of freedom for each person are n1 (i.e. the number of conditions minus 1). To get the total degrees of freedom we add the df for all participants. So, with eight participants (essays) and four conditions (i.e. n = 4) we get 8 3 = 24 degrees of freedom. The Model Sum of Squares (SSM) So far, we know that the total amount of variation within the data is 1705.868 units. We also know that 1602.5 of those units are explained by the variance created by individuals (essays) performances under different conditions. Now some of this variation is the result of our experimental manipulation and some of this variation is simply random fluctuation. The next step is to work out how much variance is explained by our manipulation and how much is not. In independent ANOVA, we worked out how much variation could be explained by our experiment (the model SS) by looking at the means for each group and comparing these to the overall mean. So, we measured the variance resulting from the differences between group means and the overall mean. We do exactly the same thing with a repeatedmeasures design. First we calculate the mean for each level of the independent variable (in this case the mean mark given by each lecturer) and compare these values to the overall mean of all marks. So, we calculate this SS in the same way as for independent ANOVA: Calculate the difference between the mean of each group and the grand mean. Square each of these differences. Multiply each result by the number of subjects within that group (ni).
161

Add the values for each group together:

SS M =

n (x x
i i

grand

Using the means from the essay data, we can calculate SSM as follows:

SSM = 8(68.875 63.9375)2 + 8(64.25 63.9375)2 + 8(65.25 63.9375)2 + K + 8(57.375 63.9375)2 = 8(4.9375)2 + 8(0.3125)2 + 8(1.3125)2 + 8(6.5625)2 = 554.125 For SSM, the degrees of freedom (dfM) are again one less than the number of things used to calculate the sum of squares. For the model sums of squares we calculated the sum of squared errors between the four means and the grand mean. Hence, we used four things to calculate these sums of squares. So, the degrees of freedom will be 3. So, as with independent ANOVA, the model degrees of freedom is always the number of groups (k) minus 1:

dfM = k 1 = 3
The Residual Sum of Squares (SSR)

We now know that there are 1706 units of variation to be explained in our data, and that the variation across our conditions accounts for 1602 units. Of these 1602 units, our experimental manipulation can explain 554 units. The final sum of squares is the residual sum of squares (SSR), which tells us how much of the variation cannot be explained by the model. This value is the amount of variation caused by extraneous factors outside of experimental control (such as natural variation in the quality of the essays). Knowing

162

SSW and SSM already, the simplest way to calculate SSR is to subtract SSM from SSW (SSR = SSW SSM): SSR = SSW SSM = 1602.5 554.125 = 1048.375 The degrees of freedom are calculated in a similar way:

dfR = dfW dfM = 24 3 = 21


The Mean Squares

SSM tells us how much variation the model (e.g. the experimental manipulation) explains and SSR tells us how much variation is due to extraneous factors. However, because both of these values are summed values the number of scores that were summed influences them. As with independent ANOVA, we eliminate this bias by calculating the average sum of squares (known as the mean squares, MS), which is simply the sum of squares divided by the degrees of freedom:
MSM = MSR = SSM 554.125 = = 184.708 dfM 3 SSR 1048.375 = = 49.923 dfR 21

MSM represents the average amount of variation explained by the model (e.g. the systematic variation), whereas MSR is a gauge of the average amount of variation explained by extraneous variables (the unsystematic variation). The F-Ratio
163

The F-ratio is a measure of the ratio of the variation explained by the model and the variation explained by unsystematic factors. It can be calculated by dividing the model mean squares by the residual mean squares. You should recall that this is exactly the same as for independent ANOVA:

F=

MS M MS R

So, as with the independent ANOVA, the F-ratio is still the ratio of systematic variation to unsystematic variation. As such, it is the ratio of the experimental effect to the effect on performance of unexplained factors. For the marking data, the F-ratio is:

F=

MS M 184.708 = = 3.70 MS R 49.923

This value is greater than 1, which indicates that the experimental manipulation had some effect above and beyond the effect of extraneous factors. As with independent ANOVA this value can be compared against a critical value based on its degrees of freedom (which are dfM and dfR, which are 3 and 21 in this case).
Task 2

Repeat the analysis above on SPSS and interpret the results.

Initial Output for One-Way Repeated-Measures ANOVA srepresent each level of the independent variable. This box is useful to check that the variables were entered in the correct order. The next table provides basic descriptive statistics for the four levels of the independent variable. From this table we can see that, on average, Dr Field gave the highest marks to the essays (thats because Im so nice you

164

see or it could be because Im stupid and so have low academic standards?). Dr Death, on the other hand, gave very low grades. These mean values are useful for interpreting any effects that may emerge from the main analysis.

Within-Subjects Factors Measure: MEASURE_1 TUTOR 1 2 3 4 Dependent Variable TUTOR1 TUTOR2 TUTOR3 TUTOR4

Descriptive Statistics Std. Deviation 5.6426 4.7132 6.9230 7.9091

Mean Dr. Field Dr. Smith Dr. Scrote Dr. Death 68.8750 64.2500 65.2500 57.3750

N 8 8 8 8

SPSS Output Error! No text of specified style in document..1 The next part of the output contains information about Mauchlys test. This test should be non-significant if we are to assume that the condition of sphericity has been met. The output shows Mauchlys test for the tutor data, and the important column is the one containing the significance value. The significance value (.043) is less than the critical value of .05, so we accept that the variances of the differences between levels are significantly different. In other words, the assumption of sphericity has been violated. Knowing that we have violated this assumption a pertinent question is: how should we proceed?
Mauchly's Test of Sphericitya Measure: MEASURE_1 Mauchly's W .131 Approx. Chi-Square 11.628 Epsilon Huynh-Feldt .712
b

Within Subjects Effect TUTOR

df 5

Sig. .043

Greenhouse-Geisser .558

Lower-bound .333

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. Design: Intercept Within Subjects Design: TUTOR b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the layers (by default) of the Tests of Within Subjects Effects table.

165

SPSS produces three corrections based upon the estimates of sphericity advocated by Greenhouse and Geisser (1959) and Huynh and Feldt (1976). Both of these estimates give rise to a correction factor that is applied to the degrees of freedom used to assess the observed F-ratio. The GreenhouseGeisser correction varies between 1/k1 (where k is
is to 1.00, the more the number of repeated measures conditions) and 1. The closer that

homogeneous the variances of differences, and hence the closer the data are to being spherical. In a situation in which there are four conditions (as with our data) the lower
will be 1/(41), or 0.33 (known as the lower-bound estimate of sphericity). The limit of

in the output is 0.558. This is closer to the lower limit of 0.33 than calculated value of

it is to the upper limit of 1 and it therefore represents a substantial deviation from sphericity. We will see how these values are used in the next section. The Main ANOVA The next table in the output shows the results of the ANOVA for the within-subjects variable. This table can be read much the same as for one-way between-group ANOVA. There is a sum of squares for the repeated-measures effect of tutor, which tells us how much of the total variability is explained by the experimental effect. Note the value is 554.125, which is model sum of squares (SSM) that we calculated in the previous task. There is also an error term, which is the amount of unexplained variation across the conditions of the repeated-measures variable. This is the residual sum of squares (SSR) that was calculated in section 0 and note the value is 1048.375 (which is the same value as calculated). As I explained earlier, these sums of squares are converted into mean squares by dividing by the degrees of freedom. As we saw before, the df for the effect of
tutor are simply k1, where k is the number of levels of the independent variable. The

166

error df are (n1)(k1), where n is the number of participants (or in this case, the number of essays) and k is as before. The F-ratio is obtained by dividing the mean squares for the experimental effect (184.708) by the error mean squares (49.923). As with betweengroup ANOVA, this test statistic represents the ratio of systematic variance to unsystematic variance. The value of F (184.71/49.92 = 3.70) is then compared against a critical value for 3 and 21 degrees of freedom. SPSS displays the exact significance level for the F-ratio. The significance of F is .028, which is significant because it is less than the criterion value of .05. We can, therefore, conclude that there was a significant difference between the marks awarded by the four lecturers. However, this main test does not tell us which lecturers differed from each other in their marking.
Tests of Within-Subjects Effects Measure: MEASURE_1 Type III Sum of Squares Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound 554.125 554.125 554.125 554.125 1048.375 1048.375 1048.375 1048.375 Mean Square 184.708 331.245 259.329 554.125 49.923 89.528 70.091 149.768

Source TUTOR

df 3 1.673 2.137 1.000 21 11.710 14.957 7.000

F 3.700 3.700 3.700 3.700

Sig. .028 .063 .047 .096

Error(TUTOR)

a. Computed using alpha = .05

Although this result seems very plausible, we have learnt that the violation of the sphericity assumption makes the F-test inaccurate. We know from Mauchlys test that these data were non-spherical and so we need to make allowances for this violation. The SPSS output shows the F-ratio and associated degrees of freedom when sphericity is assumed and the significant F-statistic indicated some difference(s) between the mean marks given by the four lecturers. In versions of SPSS after version 8, this table also
167

contains several additional rows giving the corrected values of F for the three different types of adjustment (GreenhouseGeisser, HuynhFeldt and lower-bound). Notice that in all cases the F-ratios remain the same; it is the degrees of freedom that change (and hence the critical value against which the obtained F-statistic is compared). The degrees of freedom have been adjusted using the estimates of sphericity calculated by SPSS. The adjustment is made by multiplying the degrees of freedom by the estimate of sphericity.1 The new degrees of freedom are then used to ascertain the significance of F. For these data the corrections result in the observed F being non-significant when using the GreenhouseGeisser correction (because p > .05). However, it was noted earlier that this correction is quite conservative, and so can miss effects that genuinely exist. It is, therefore, useful to consult the HuynhFeldt-corrected F-statistic. Using this correction, the F-value is still significant because the probability value of .047 is just below the criterion value of .05. So, by this correction we would accept the hypothesis that the lecturers differed in their marking. However, it was also noted earlier that this correction is quite liberal and so tends to accept values as significant when, in reality, they are not significant. This leaves us with the puzzling dilemma of whether or not to accept this F-statistic as significant. I mentioned earlier that Stevens (2002) recommends taking an average of the two estimates, and certainly when the two corrections give different results (as is the case here) this is wise advice. If the two corrections give rise to the same conclusion it makes little difference which you choose to report (although if you accept the F-statistic as significant it is best to report the conservative Greenhouse
1 For example, the GreenhouseGeisser estimate of sphericity was 0.558. The original degrees of freedom for the model were 3; this value is corrected by multiplying by the estimate of sphericity (3 0.558 = 1.674). Likewise the error df were 21; this value is corrected in the same way (21 0.558 = 11.718). The F-ratio is then tested against a critical value with these new degrees of freedom (1.674, 11.718). The other corrections are applied in the same way.

168

Geisser estimate to avoid criticism!). Although it is easy to calculate the average of the two correction factors and to correct the degrees of freedom accordingly, it is not so easy to then calculate an exact probability for those degrees of freedom. Therefore, should you ever be faced with this perplexing situation (and to be honest thats fairly unlikely) I recommend taking an average of the two significance values to give you a rough idea of which correction is giving the most accurate answer. In this case, the average of the two p-values is (.063 + 0.047)/2 = .055. Therefore, we should probably go with the GreenhouseGeisser correction and conclude that the F-ratio is non-significant. These data illustrate how important it is to use a valid critical value of F: it can mean the difference between a statistically significant result and a non-significant result. More importantly, it can mean the difference between making a Type I error and not. Had we not used the corrections for sphericity we would have concluded erroneously that the markers gave significantly different marks. However, I should quantify this statement by saying that this example also highlights how arbitrary it is that we use a .05 level of significance. These two corrections produce significance values only marginally less than or more than .05, and yet they lead to completely opposite conclusions! So, we might be well advised to look at an effect size to see whether the effect is substantive regardless of its significance. We also saw earlier that a final option, when you have data that violate sphericity, is to use multivariate test statistics (MANOVA) because they do not make this assumption (see OBrien & Kaiser, 1985). The repeated-measures procedure in SPSS automatically produces multivariate test statistics. The next output shows the multivariate test statistics for this example. The column displaying the significance values clearly shows that the

169

multivariate tests are non-significant (because p is .063, which is greater than the criterion value of .05). Bearing in mind the loss of power in these tests, this result supports the decision to accept the null hypothesis and conclude that there are no significant differences between the marks given by different lecturers. The interpretation of these results should stop now because the main effect is non-significant. However, we will look at the output for contrasts to illustrate how these tests are displayed in the SPSS Viewer.
Multivariate Testsa Hypothesis df
c

Effect TUTOR Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root

Value .741 .259 2.856 2.856

F 4.760 4.760c 4.760c 4.760c

Error df 5.000 5.000 5.000 5.000

Sig. .063 .063 .063 .063

3.000 3.000 3.000 3.000

a. Design: Intercept Within Subjects Design: TUTOR b. Computed using alpha = .05 c. Exact statistic

Contrasts The transformation matrix requested in the options is shown in the next SPSS output and we have to draw on our knowledge of contrast coding to interpret this table. The first thing to remember is that a code of 0 means that the group is not included in a contrast. Therefore, contrast 1 (labelled Level 1 vs. Level 2 in the table) ignores Dr Scrote and Dr Death. The next thing to remember is that groups with a negative weight are compared to groups with a positive weight. In this case this means that the first contrast compares Dr Field against Dr Smith. Using the same logic, contrast 2 (labelled Level 3 vs. Level 3) ignores Dr Field and Dr Death and compares Dr Smith and Dr Scrote. Finally, contrast three (Level 3 vs. Level 4) compares Dr Death with Dr Scrote. This pattern of contrasts is consistent with what we expect to get from a repeated contrast (i.e. all groups except the
170

first are compared to the preceding category). The transformation matrix, which appears at the bottom of the output, is used primarily to confirm what each contrast represents.

a TUTOR

Measure: MEASURE_1 Level 1 vs. Level 2 1 -1 0 0 TUTOR Level 2 vs. Level 3 0 1 -1 0 Level 3 vs. Level 4 0 0 1 -1

Dependent Variable Dr. Field Dr. Smith Dr. Scrote Dr. Death a.

The contrasts for the within subjects factors are: TUTOR: Repeated contrast

Above the transformation matrix, we should find a summary table of the contrasts. Each contrast is listed in turn, and as with between-group contrasts, an F-test is performed that compares the two chunks of variation. So, looking at the significance values from the table, we could say that Dr Field marked significantly more highly than Dr Smith (Level 1 vs. Level 2), but that Dr Smiths marks were roughly equal to Dr Scrotes (Level 2 vs. Level 3) and Dr Scrotes marks were roughly equal to Dr Deaths (Level 3 vs. Level 4). However, the significant contrast should be ignored because of the non-significant main effect (remember that the data did not obey sphericity). The important point to note is that the sphericity in our data has led to some important issues being raised about correction factors, and about applying discretion to your data (its comforting to know that the computer does not have all of the answers, but its slightly alarming to realize that this means we have to actually know some of the answers ourselves). In this example we would have to conclude that no significant differences existed between the marks given by different lecturers. However, the ambiguity of our data might make us consider running a similar study with a greater number of essays being marked.

171

Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source TUTOR TUTOR Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Type III Sum of Squares 171.125 8.000 496.125 65.875 368.000 1010.875 df 1 1 1 7 7 7 Mean Square 171.125 8.000 496.125 9.411 52.571 144.411 F 18.184 .152 3.436 Sig. .004 .708 .106

Error(TUTOR)

Post Hoc Tests If you selected post hoc tests for the repeated measures variable in the options dialog box, then the table in below will be produced in the SPSS Viewer.
Pairwise Comparisons Measure: MEASURE_1 Mean Difference (I-J) 4.625* 3.625 11.500 -4.625* -1.000 6.875 -3.625 1.000 7.875 -11.500 -6.875 -7.875 95% Confidence Interval for a Difference Lower Bound Upper Bound .682 8.568 -6.703 13.953 -5.498 28.498 -8.568 -.682 -10.320 8.320 -9.039 22.789 -13.953 6.703 -8.320 10.320 -7.572 23.322 -28.498 5.498 -22.789 9.039 -23.322 7.572

(I) TUTOR 1

(J) TUTOR 2 3 4 1 3 4 1 2 4 1 2 3

Std. Error 1.085 2.841 4.675 1.085 2.563 4.377 2.841 2.563 4.249 4.675 4.377 4.249

Sig. .022 1.000 .261 .022 1.000 .961 1.000 1.000 .637 .261 .961 .637

Based on estimated marginal means *. The mean difference is significant at the .05 level. a. Adjustment for multiple comparisons: Bonferroni.

The difference between group means is displayed, and also the standard error, the significance value and a confidence interval for the difference between means. By looking at the significance values we can see that the only difference between group means is between Dr Field and Dr Smith. Looking at the means of these groups we can see that I give significantly higher marks than Dr Smith. However, there is a rather

172

anomalous result in that there is no significant difference between the marks given by Dr Death and myself even though the mean difference between our marks is higher (11.5) than the mean difference between myself and Dr Smith (4.6). The reason for this result is the sphericity in the data. The interested reader might like to run some correlations between the four tutors grades. You will find that there is a very high positive correlation between the marks given by Dr Smith and myself (indicating a low level of variability in our data). However, there is a very low correlation between the marks given by Dr Death and myself (indicating a high level of variability between our marks). It is this large variability between Dr Death and myself that has produced the non-significant result despite the average marks being very different (this observation is also evident from the standard errors). Effect Sizes for Repeated-Measures ANOVA In repeated measures ANOVA, the equation for 2 is (hang onto your hats):

k 1 (MSM MSR ) nk 2 = MSBG MSR k 1 + MSR + (MSM MSR ) k nk SPSS doesnt give us SSW in the output, but we know that this is made up of SSM and SSR, which we are given. By substituting these terms, and rearranging the equation, we get:

SST = SSBG + SSM + SSR SSBG = SST SSM SSR

173

The next problem is that SPSS, which is clearly trying to hinder us at every step, doesnt give us SST and Im afraid (unless Ive missed something in the output) youre just going to have to calculate it by hand. From the values we calculated earlier, you should get: SSBG = 1705.868 554.125 1048.375 = 103.37 The next step is to convert this to a mean squares by dividing by the degrees of freedom, which in this case are the number of people in the experiment minus 1 (n 1): MSBG = = SSBG SSBG = dfBG n 1

103.37 8 1 = 14.77 Having done all this and probably died of boredom in the process we must now resurrect ourselves with renewed vigour for the effect size equation, which becomes:
4 1 (184.71 49.92 ) 8 4 = 14.77 49.92 4 1 (184.71 49.92 ) 49.92 + + 4 8 4 12.64 = 53.77 = 0.24
2

So, we get 2.24. If you calculate it the same way as for the independent ANOVA you should get a slightly bigger answer (.25 in fact). Ive mentioned at various other points that its actually more useful to have effect size measures for focused comparisons anyway (rather than the main ANOVA), and so, a slightly easier approach to calculating effect sizes is to calculate them for the contrasts we

174

did. For these we can use the equation that weve seen before to convert the F-values (because they all have 1 degree of freedom for the model) to r:
r= F (1, dfR ) F (1, dfR ) + dfR

For the three comparisons we did, we would get:


rField vs. Smith = rSmith vs. Scrote = rScrote vs. Death = 18.18 = 0.85 18.18 + 7 0.15 = 0.14 0.15 + 7 3.44 = 0.57 3.44 + 7

Therefore, the differences between Drs Field and Smith and Scrote and Death were both large effects, but the differences between Drs Smith and Scrote were small. Reporting One-Way repeated-Measures ANOVA We could report the main finding as: The results show that the mark of an essay was not significantly affected by the lecturer that marked it, F(1.67, 11.71) = 3.70, p > .05. If you choose to report the sphericity test as well, you should report the chi-square approximation, its degrees of freedom and the significance value. Its also nice to report the degree of sphericity by reporting the epsilon value. Well also report the effect size in this improved version: Mauchlys test indicated that the assumption of sphericity had been violated (2(5) = 11.63, p < .05), therefore degrees of freedom were corrected using Greenhouse

175

Geisser estimates of sphericity ( = .56). The results show that the mark of an essay was not significantly affected by the lecturer that marked it, F(1.67, 11.71) = 3.70, p > .05, 2 = .24. Remember that because the main ANOVA was not significant we shouldnt report any further analysis.
Task 3

Imagine I wanted to look at the effect alcohol has on the roving eye. The roving eye effect is the propensity of people in relationships to eye-up members of the opposite sex. I took 20 men and fitted them with incredibly sophisticated glasses that could track their eye movements and record both the movement and the object being observed (this is the point at which it should be apparent that Im making it up as I go along). Over four different nights I plied these poor souls with 1, 2, 3 or 4 pints of strong lager in a night-club. Each night I measured how many different women they eyed up (a women was categorized as having been eyed up if the mans eye moved from her head to toe and back up again). To validate this measure we also collected the amount of dribble on the mans chin while looking at a woman. The data are in the file RovingEye.sav. Analyse them with a one-way ANOVA.

SPSS Output

176

This error bar chart of the roving eye data shows the mean number of women that were eyed up after different doses of alcohol. Its clear from this chart that the mean number of women is pretty similar between 1 and 2 pints, and for 3 and 4 pints, but there is a jump after 2 pints.
Within-Subjects Factors Measure: MEASURE_1 ALCOHOL 1 2 3 4 Dependent Variable PINT1 PINT2 PINT3 PINT4

Descriptive Statistics Mean 11.7500 11.7000 15.2000 14.9500 Std. Deviation 4.31491 4.65776 5.80018 4.67327 N 20 20 20 20

1 Pint 2 Pints 3 Pints 4 Pints

These outputs show the initial diagnostic statistics. First, we are told the variables that represent each level of the independent variable. This box is useful to check that the variables were entered in the correct order. The next table provides basic descriptive statistics for the four levels of the independent variable. This table confirms what we saw in the graph.

177

b Mauchly's Test of Sphericity

Measure: MEASURE_1 Epsilon Within Subjects Effect ALCOHOL Mauchly's W .477 Approx. Chi-Square 13.122 df 5 Sig. .022 GreenhouseGeisser .745
a

Huynh-Feldt .849

Lower-bound .333

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. b. Design: Intercept Within Subjects Design: ALCOHOL

The next part of the output contains Mauchlys test and we hope to find that its nonsignificant if we are to assume that the condition of sphericity has been met. However, the significance value (0.022) is less than the critical value of 0.05, so we accept that the assumption of sphericity has been violated.
Tests of Within-Subjects Effects Measure: MEASURE_1 Source ALCOHOL Type III Sum of Squares 225.100 225.100 225.100 225.100 904.400 904.400 904.400 904.400 df 3 2.235 2.547 1.000 57 42.469 48.398 19.000 Mean Square 75.033 100.706 88.370 225.100 15.867 21.296 18.687 47.600 F 4.729 4.729 4.729 4.729 Sig. .005 .011 .008 .042

Error(ALCOHOL)

Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound

This output shows the main result of the ANOVA. The significance of F is 0.005, which is significant because it is less than the criterion value of 0.05. We can, therefore, conclude that alcohol had a significant effect on the average number of women that were eyed up. However, this main test does not tell us which quantities of alcohol made a difference to the number of women eyed up. This result is all very nice but as of yet we havent done anything about our violation of the sphericity assumption. This table contains several additional rows giving the corrected values of F for the three different types of adjustment (GreenhouseGeisser, HuynhFeldt and lower-bound). First we decide which correction to apply and to do this

178

we need to look at the estimates of sphericity: if the GreenhouseGeisser and Huynh Feldt estimates are less than 0.75 we should use GreenhouseGeisser, and if they are above 0.75 we use HuynhFeldt. We discovered in the book that based on these criteria we should use HuynhFeldt here. Using this corrected value we still find a significant result because the observed p (.008) is still less than the criterion of .05.
Pairwise Comparisons Measure: MEASURE_1 Mean Difference (I-J) Std. Error 5.000E-02 .742 -3.450 1.391 -3.200 1.454 -5.000E-02 .742 -3.500* 1.139 -3.250 1.420 3.450 1.391 3.500* 1.139 .250 1.269 3.200 1.454 3.250 1.420 -.250 1.269 95% Confidence Interval for a Difference Lower Bound Upper Bound -2.133 2.233 -7.544 .644 -7.480 1.080 -2.233 2.133 -6.853 -.147 -7.429 .929 -.644 7.544 .147 6.853 -3.485 3.985 -1.080 7.480 -.929 7.429 -3.985 3.485

(I) ALCOHOL 1

(J) ALCOHOL 2 3 4 1 3 4 1 2 4 1 2 3

Sig. 1.000 .136 .242 1.000 .038 .202 .136 .038 1.000 .242 .202 1.000

Based on estimated marginal means *. The mean difference is significant at the .05 level. a. Adjustment for multiple comparisons: Bonferroni.

The main effect of alcohol doesnt tell us anything about which doses of alcohol produced different results to other doses. So, we might do some post hoc tests as well. The output above shows the table from SPSS that contains these tests. We read down the column labelled Sig. and look for values less than 0.05. By looking at the significance values we can see that the only difference between condition means is between 2 and 3 pints of alcohol. Interpreting and Writing the Result We could report the main finding as: Mauchlys test indicated that the assumption of sphericity had been violated (2(5) = 13.12, p < .05), therefore degrees of freedom were corrected using HuynhFeldt

179

estimates of sphericity ( = .85). The results show that the number of women eyed up was significantly affected by the amount of alcohol drunk, F(2.55, 48.40) = 4.73, p < .05, r = .40). Bonferroni post hoc tests revealed a significant difference in the number of women eyed up only between 2 and 3 pints (CI.95 = 6.85 (lower), .15 (upper), p < .05). No other comparisons were significant (all ps > .05).
Task 4

In the previous chapter we came across the beergoggles effect: a severe perceptual distortion after imbibing vast quantities of alcohol. The specific visual distortion is that previously unattractive people suddenly become the hottest thing since Spicy Gonzalez extra hot Tabasco-marinated chillies. In short, one minute youre standing in a zoo admiring the orang utans, and the next youre wondering why someone would put Gail Porter (or whatever her surname is now) into a cage. Anyway, in that chapter, a blatantly fabricated data set demonstrated that the beergoggles effect was much stronger for men than women, and took effect only after 2 pints. Imagine we wanted to follow this finding up to look at what factors mediate the beergoggles effect. Specifically, we thought that the beer goggles effect might be made worse by the fact that it usually occurs in clubs, which have dim lighting. We took a sample of 26 men (because the effect is stronger in men) and gave them various doses of alcohol over four different weeks (0 pints, 2 pints, 4 pints and 6 pints of lager). This is our first independent variable, which well call alcohol consumption, and it has four levels. Each week (and, therefore, in each state of drunkenness) participants were asked to select a mate in a normal club (that had dim lighting) and then select a second mate in a

180

specially designed club that had bright lighting. As such, the second independent variable was whether the club had dim or bright lighting. The outcome measure was the attractiveness of each mate as assessed by a panel of independent judges. To recap, all participants took part in all levels of the alcohol consumption variable, and selected mates in both brightly and diml lit clubs. The data are in the file BeerGogglesLighting.sav. Analyse them with a two-way repeated-measures ANOVA. SPSS Output

80 Dim Lighting Bright Lighting 60

Mean Attractiveness (%)

40

20

0 0 Pints 2 Pints 4 Pints 6 Pints

Alcohol Consumption

This chart displays the mean attractiveness of the partner selected (with error bars) in dim and brightly lit clubs after the different doses of alcohol. The chart shows that in both dim and brightly lit clubs there is a tendency for men to select less attractive mates as they consume more and more alcohol.
Descriptive Statistics Mean 65.0000 65.4615 37.2308 21.3077 61.5769 60.6538 50.7692 40.7692 Std. Deviation 10.30728 8.76005 10.86391 10.67247 9.70432 10.65060 10.34334 10.77519 N 26 26 26 26 26 26 26 26

0 Pints (Dim Lighting) 2 Pints (Dim Lighting) 4 Pints (Dim Lighting) 6 Pints (Dim Lighting) 0 Pints (Bright Lighting) 2 Pints (Bright Lighting) 4 Pints (Bright Lighting) 6 Pints (Bright Lighting)

181

This shows the means for all conditions in a table. These means correspond to those plotted in the graph.
b Mauchly's Test of Sphericity

Measure: MEASURE_1 Epsilon Within Subjects Effect Mauchly's W LIGHTING 1.000 ALCOHOL .820 LIGHTING * ALCOHOL .898 Approx. Chi-Square .000 4.700 2.557 df 0 5 5 Sig. . .454 .768 Greenhouse -Geisser 1.000 .873 .936
a

Huynh-Feldt 1.000 .984 1.000

Lower-bound 1.000 .333 .333

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. b. Design: Intercept Within Subjects Design: LIGHTING+ALCOHOL+LIGHTING*ALCOHOL

The lighting variable had only two levels (dim or bright) and so the assumption of sphericity doesnt apply and SPSS doesnt produce a significance value. However, for the effects of alcohol consumption and the interaction of alcohol consumption and lighting, we do have to look at Mauchlys test. The significance values are both above 0.05 (they are 0.454 and 0.768 respectively) and so we know that the assumption of sphericity has been met for both alcohol consumption and the interaction of alcohol consumption and lighting.

182

Tests of Within-Subjects Effects Measure: MEASURE_1 Source LIGHTING Type III Sum of Squares 1993.923 1993.923 1993.923 1993.923 2128.327 2128.327 2128.327 2128.327 38591.654 38591.654 38591.654 38591.654 9242.596 9242.596 9242.596 9242.596 5765.423 5765.423 5765.423 5765.423 6487.327 6487.327 6487.327 6487.327 df 1 1.000 1.000 1.000 25 25.000 25.000 25.000 3 2.619 2.953 1.000 75 65.468 73.819 25.000 3 2.809 3.000 1.000 75 70.232 75.000 25.000 Mean Square 1993.923 1993.923 1993.923 1993.923 85.133 85.133 85.133 85.133 12863.885 14736.844 13069.660 38591.654 123.235 141.177 125.206 369.704 1921.808 2052.286 1921.808 5765.423 86.498 92.370 86.498 259.493 F 23.421 23.421 23.421 23.421 Sig. .000 .000 .000 .000

Error(LIGHTING)

ALCOHOL

Error(ALCOHOL)

LIGHTING * ALCOHOL

Error(LIGHTING*ALCOHOL)

Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound

104.385 104.385 104.385 104.385

.000 .000 .000 .000

22.218 22.218 22.218 22.218

.000 .000 .000 .000

This output shows the main ANOVA summary


60 50

Mean Attractiveness (%)

table. The main effect of lighting is shown by the Fratio in the row labelled
LIGHTING.

40

30

The significance

20

of this value is 0.000, which is well below the usual


Dim Bright

10

cut-off point of 0.05. We can conclude that average

Lighting

attractiveness ratings were significantly affected by whether mates were selected in a dim or well-lit club. We can easily interpret this result further because there were only two levels: attractiveness ratings were higher in the well-lit clubs, so we could conclude that when we ignore how much alcohol was consumed, the mates selected in well-lit clubs were significantly more attractive than those chosen in dim clubs. The main effect of alcohol consumption is shown by the F-ratio in the row labelled
ALCOHOL.
70 60

The probability associated with this F-

ratio is reported as 0.000 (i.e. p < 0.001), which is

Mean Attractiveness (%)

50

40

30

20

10

183
0 Pints 2 Pints 4 Pints 6 Pints

Alcohol Consumption

well below the critical value of 0.05. We can conclude that there was a significant main effect of the amount of alcohol consumed on the attractiveness of the mate selected. We know that generally there was an effect, but without further tests (e.g. post hoc comparisons) we cant say exactly which doses of alcohol had the most effect. Ive plotted the means for the four doses. This graph shows that when you ignore the lighting in the club, the attractiveness of mates is similar after no alcohol and 2 pints of lager but starts to rapidly decline at 4 pints and continues to decline after 6 pints.
Pairwise Comparisons Measure: MEASURE_1 Mean Difference (I-J) Std. Error .231 2.006 19.288* 2.576 32.250* 1.901 -.231 2.006 19.058* 2.075 32.019* 1.963 -19.288* 2.576 -19.058* 2.075 12.962* 2.450 -32.250* 1.901 -32.019* 1.963 -12.962* 2.450 95% Confidence Interval for a Difference Lower Bound Upper Bound -5.517 5.978 11.909 26.668 26.804 37.696 -5.978 5.517 13.112 25.003 26.395 37.644 -26.668 -11.909 -25.003 -13.112 5.942 19.981 -37.696 -26.804 -37.644 -26.395 -19.981 -5.942

(I) ALCOHOL 1

(J) ALCOHOL 2 3 4 1 3 4 1 2 4 1 2 3

Sig. 1.000 .000 .000 1.000 .000 .000 .000 .000 .000 .000 .000 .000

Based on estimated marginal means *. The mean difference is significant at the .05 level. a. Adjustment for multiple comparisons: Bonferroni.

This output shows some post hoc tests for the main effect of alcohol. In this example Ive chosen a Bonferroni correction. The main column of interest is the one labelled Sig., but the confidence intervals also tell us the likely difference between means if we were to take other samples. The mean attractiveness was significantly higher after no pints than it was after 4 pints and 6 pints (both ps are less than 0.001). We can also see that the mean attractiveness after 2 pints was significantly higher than after 4 pints and 6 pints (again, both ps are less than 0.001). Finally, the mean attractiveness after 4 pints was significantly higher than after 6 pints (p is less than 0.001). So, we can conclude that the

184

beergoggles effect doesnt kick in until after 2 pints, and that it has an ever-increasing effect (well, up to 6 pints at any rate!). The interaction effect is shown by the F-ratio in the row labelled LIGHTING*ALCOHOL. The resulting F-ratio is 22.22 (1921.81/86.50), which has an associated probability value of 0.000 (i.e. p < 0.001). As such, there is a significant interaction between the amount of alcohol consumed and the lighting in the club on the attractiveness of the mate selected.
Tests of Within-Subjects Contrasts Measure: MEASURE_1 Source LIGHTING Error(LIGHTING) ALCOHOL LIGHTING Level 1 vs. Level 2 Level 1 vs. Level 2 ALCOHOL Type III Sum of Squares 996.962 1064.163 1.385 9443.087 4368.038 2616.115 2799.663 3902.462 49.846 8751.115 912.154 8680.154 8839.885 10569.846 df 1 25 1 1 1 25 25 25 1 1 1 25 25 25 Mean Square 996.962 42.567 1.385 9443.087 4368.038 104.645 111.987 156.098 49.846 8751.115 912.154 347.206 353.595 422.794 F 23.421 .013 84.323 27.983 Sig. .000 .909 .000 .000

Error(ALCOHOL)

LIGHTING * ALCOHOL Level 1 vs. Level 2

Error(LIGHTING*ALCO Level 1 vs. Level 2 HOL)

Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4 Level 1 vs. Level 2 Level 2 vs. Level 3 Level 3 vs. Level 4

.144 24.749 2.157

.708 .000 .154

This output shows the output from a set of contrasts that compare each level of the alcohol variable to the previous level of that variable (this is called a repeated contrast in SPSS). So, it compares no pints with 2 pints (Level 1 vs. Level 2), 2 pints with 4 pints (Level 2 vs. Level 3) and 4 pints with 6 pints (Level 3 vs. Level 4). As you can see from the output, if we just look at the main effect of group these contrasts tell us what we already know from the post hoc tests: that is, the attractiveness after no alcohol doesnt differ from the attractiveness after 2 pints, F(1, 25) < 1, the attractiveness after 4 pints does differ from that after 2 pints, F(1, 25) = 84.32, p < 0.001, and the attractiveness after 6 pints does differ from that after 4 pints, F(1, 25) = 27.98, p < 0.001. More interesting is to look at the interaction term in the table. This compares the same levels of the alcohol variable, but for each comparison it is also comparing the difference between the means

185

for the dim and brightly lit clubs. One way to think of this is
80

to look at the interaction graph and note the vertical differences between the means for dim and bright clubs at each level of alcohol. When nothing was drunk, the distance between the bright and dim means is quite small (its actually
Mean Attractiveness (%)
60 40

Dim Lighting Bright Lighting

20

0 0 Pints 2 Pints 4 Pints 6 Pints

Alcohol Consumption

3.42 units on the attractiveness scale), when 2 pints of alcohol are drunk the difference between the dim and well-lit club is still quite small (4.81 units to be precise). The first contrast is comparing the difference between dim and bright clubs when nothing was drunk with the difference between dim and bright clubs when 2 pints were drunk. So, it is asking is 3.42 significantly different from 4.81? The answer is no, because the F-ratio is non-significantin fact, its less than 1 (F(1, 25) < 1). The second contrast for the interaction is looking at the difference between dim and bright clubs when 2 pints were drunk (4.81) with the difference between dim and bright clubs when 4 pints were drunk (this difference is 13.54; note that the direction of the difference has changed as indicated by the lines crossing in the graph). This difference is significant (F(1, 25) = 24.75, p < 0.001). The final contrast for the interaction is looking at the difference between dim and bright clubs when 4 pints were drunk (13.54) with the difference between dim and bright clubs when 6 pints were drunk (this difference is 19.46). This contrast is not significant (F(1, 25) = 2.16, ns). So, we could conclude that there was a significant interaction between the amount of alcohol drunk and the lighting in the club. Specifically, the effect of alcohol after 2 pints on the attractiveness of the mate was much more pronounced when the lights were dim. Writing the Result

186

We can report the three effects from this analysis as follows: The results show that the attractiveness of the mates selected was significantly lower when the lighting in the club was dim compared to when the lighting was bright, F(1, 25) = 23.42, p < .001. The main effect of alcohol on the attractiveness of mates selected was significant, F(3, 75) = 104.39, p < .001. This indicated that when the lighting in the club was ignored, the attractiveness of the mates selected differed according to how much alcohol was drunk before the selection was made. Specifically, post hoc tests revealed that compared to a baseline of when no alcohol had been consumed, the attractiveness of selected mates was not different after 2 pints (p > .05), but was significantly lower after 4 and 6 pints (both ps < .001). The mean attractiveness after 2 pints was also significantly higher than after 4 pints and 6 pints (both ps < .001), and the mean attractiveness after 4 pints was significantly higher than after 6 pints (p < .001). To sum up, the beergoggles effect seems to take effect after 2 pints have been consumed and has an increasing impact until 6 pints are consumed. The lighting alcohol interaction was significant, F(3, 75) = 22.22, p < .001, indicating that the effect of alcohol on the attractiveness of the mates selected differed when lighting was dim compared to when it was bright. Contrasts on this interaction term revealed that when the difference in attractiveness ratings between dim and bright clubs was compared after no alcohol and after 2 pints had been drunk there was no significant difference, F(1, 25) < 1. However, when comparing the difference between dim and bright clubs when 2 pints were drunk
187

with the difference after 4 pints were drunk a significant difference emerged, F(1, 25) = 24.75, p < .001. A final contrast revealed that the difference between dim and bright clubs after 4 pints were drunk compared to after 6 pints was not significant, F(1, 25) = 2.16, ns. To sum up, there was a significant interaction between the amount of alcohol drunk and the lighting in the club: the decline in the attractiveness of the selected mate seen after 2 pints (compared to after 4) was significantly more pronounced when the lights were dim.
Task 5

Change the syntax in SimpleEffectsAttitude.sps to look at the effect of drink at different levels of imagery

The correct syntax to use is:

MANOVA beerpos beerneg beerneut winepos wineneg wineneut waterpos waterneg waterneu /WSFACTORS drink(3) imagery(3) /WSDESIGN = MWITHIN imagery(1) MWITHIN imagery(2) MWITHIN imagery(3) /PRINT SIGNIF( UNIV MULT AVERF HF GG ).

SPSS Output
The main part of the analysis is:

* * * * * * A n a l y s i s

o f

V a r i a n c e -- design

1 * * * * * *

188

Tests involving 'MWITHIN IMAGERY(1)' Within-Subject Effect. Tests of Significance for T1 using UNIQUE sums of squares Source of Variation SS DF MS F WITHIN+RESIDUAL MWITHIN IMAGERY(1) 1088.40 27136.27 19 1 57.28 27136.27

Sig of F

473.71

.000

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * *

Tests involving 'MWITHIN IMAGERY(2)' Within-Subject Effect. Tests of Significance for T2 using UNIQUE sums of squares Source of Variation SS DF MS F WITHIN+RESIDUAL MWITHIN IMAGERY(2) 3113.92 1870.42 19 1 163.89 1870.42

Sig of F

11.41

.003

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * *

Tests involving 'MWITHIN IMAGERY(3)' Within-Subject Effect. Tests of Significance for T3 using UNIQUE sums of squares Source of Variation SS DF MS F WITHIN+RESIDUAL MWITHIN IMAGERY(3) 1070.67 3840.00 19 1 56.35 3840.00

Sig of F

68.14

.000

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

What this shows is a significant effect of drink at level 1 of imagery. So, the ratings of the three drinks significantly differed when positive imagery was used. Because there are three levels of drink, though, this isnt that helpful in untangling whats going on. There is also a significant effect of drink at level 2 of imagery. So, the ratings of the three drinks significantly differed when negative imagery was used. Finally, there is also a significant effect of drink at level 3 of imagery. So, the ratings of the three drinks significantly differed when neutral imagery was used.
Chapter 14 Task 1

189

I am going to extend the example from the previous chapter (advertising and different imagery) by adding a between-group variable into the design.2 To recap, in case you havent read the previous chapter, participants viewed a total of nine mock adverts over three sessions. In these adverts there were three products (a brand of beer, Brain Death, a brand of wine, Dangleberry, and a brand of water, Puritan). These could be presented alongside positive, negative or neutral imagery. Over the three sessions and nine adverts each type of product was paired with each type of imagery (read the previous chapter if you need more detail). After each advert participants rated the drinks on a scale ranging from 100 (dislike very much) through 0 (neutral) to 100 (like very much). The design, thus far, has two independent variables: the type of drink (beer, wine or water) and the type of imagery used (positive, negative or neutral). These two variables completely cross over, producing nine experimental conditions. Now imagine that I also took note of each persons gender. Subsequent to the previous analysis it occurred to me that men and women might respond differently to the products (because, in keeping with stereotypes, men might mostly drink lager whereas women might drink wine). Therefore, I wanted to reanalyse the data taking this additional variable into account. Now, gender is a between-group variable because a participant can be only male or female: they cannot participate as a male and then change into a female and participate again! The data are the same

Previously the example contained two repeated-measures variables (drink type and imagery type), now it will include three variables (two repeated-measures and one between-group). 190

as in the previous chapter and can be found in the file MixedAttitude.sav. Run a mixed ANOVA on these data. To carry out the analysis on SPSS follow the same instructions that we did before, so first of all access the define factors dialog box by using the file path

. We are using the same repeated-measures variables as in Chapter 13 of the book, so complete this dialog box exactly as shown there, and then click on to access the main dialog box. This box should be

completed exactly as before except that we must specify gender as a between-group variable by selecting it in the variables list and clicking Between-Subjects Factors. to transfer it to the box labelled

191

Gender has only two levels (male or female) so there is no need to specify contrasts for

this variable; however, you should select simple contrasts for both drink and imagery. The addition of a between-group factor means that we can select post hoc tests for this variable by clicking on . This action brings up the post hoc test dialog box, which

can be used as previously explained. However, we need not specify any post hoc tests here because the between-group factor has only two levels. The addition of an extra variable makes it necessary to choose a different graph to the one in the previous example. Click on to access the dialog box and place drink and imagery in the

same slots as for the previous example but also place gender in the slot labelled Separate Plots. When all three variables have been specified, dont forget to click on to add

this combination to the list of plots. By asking SPSS to plot the drink imagery gender interaction, we should get the same interaction graph as before, except that a separate version of this graph will be produced for male and female subjects. As far as other options are concerned, you should select the same ones that were chosen in Chapter 13. It is worth selecting estimated marginal means for all effects (because these values will help you to understand any significant effects), but to save space I did not ask for confidence intervals for these effects because we have considered this part of the output in some detail already. When all of the appropriate options have been selected, run the analysis.

192

Main Analysis The initial output is the same as the two-way ANOVA example: there is a table listing the repeated-measures variables from the data editor and the level of each independent variable that they represent. The second table contains descriptive statistics (mean and standard deviation) for each of the nine conditions split according to whether participants were male or female. The names in this table are the names I gave the variables in the data editor (therefore, your output may differ slightly). These descriptive statistics are interesting because they show us the pattern of means across all experimental conditions (so, we use these means to produce the graphs of the three-way interaction). We can see that the variability among scores was greatest when beer was used as a product, and that when a corpse image was used the ratings given to the products were negative (as expected) for all conditions except the men in the beer condition. Likewise, ratings of products were very positive when a sexy person was used as the imagery irrespective of the gender of the participant, or the product being advertised.
193

Descriptive Statistics Std. Deviation 14.0063 11.3925 13.0080 7.8379 5.1381 17.3037 8.5434 6.7074 10.2956 7.6311 4.1150 6.7378 4.9396 4.1312 6.1815 4.9721 4.3919 6.2431 6.7864 6.3953 7.0740 6.7791 7.1368 6.8025 6.2973 3.8816 6.8386

Gender Beer + Sexy Male Female Total Male Female Total Male Female Total Male Female Total Male Female Total Male Female Total Male Female Total Male Female Total Male Female Total

Mean 24.8000 17.3000 21.0500 20.1000 -11.2000 4.4500 16.9000 3.1000 10.0000 22.3000 28.4000 25.3500 -7.8000 -16.2000 -12.0000 7.5000 15.8000 11.6500 14.5000 20.3000 17.4000 -9.8000 -8.6000 -9.2000 -2.1000 6.8000 2.3500

N 10 10 20 10 10 20 10 10 20 10 10 20 10 10 20 10 10 20 10 10 20 10 10 20 10 10 20

Beer + Corpse

Beer + Person in Armchair

Wine + Sexy

Wine + Corpse

Wine + Person in Armchair

Water + Sexy

Water + Corpse

Water + Person in Armchair

The results of Mauchlys sphericity test are different to the example in Chapter 13, because the between-group factor is now being accounted for by the test. The main effect of drink still significantly violates the sphericity assumption (W = 0.572, p < 0.01) but the main effect of imagery no longer does. Therefore, the F-value for the main effect of drink (and its interaction with the between-group variable gender) needs to be corrected for this violation.
b Mauchly's Test of Sphericity

Measure: MEASURE_1 Mauchly's W .572 .965 .609 Approx. Chi-Square 9.486 .612 8.153 Epsilon Huynh-Feldt .784 1.000 1.000
a

Within Subjects Effect DRINK IMAGERY DRINK * IMAGERY

df 2 2 9

Sig. .009 .736 .521

Greenhouse-Geisser .700 .966 .813

Lower-bound .500 .500 .250

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the layers (by default) of the Tests of Within Subjects Effects table. b. Design: Intercept+GENDER - Within Subjects Design: DRINK+IMAGERY+DRINK*IMAGERY

194

The summary table of the repeated-measures effects in the ANOVA is split into sections for each of the effects in the model and their associated error terms. The table format is the same as for the previous example, except that the interactions between gender and the repeated-measures effects are included also. We would expect to still find the effects that were previously present (in a balanced design, the inclusion of an extra variable should not affect these effects). By looking at the significance values it is clear that this prediction is true: there are still significant effects of the type of drink used, the type of imagery used, and the interaction of these two variables. In addition to the effects already described we find that gender interacts significantly with the type of drink used (so, men and women respond differently to beer, wine and water regardless of the context of the advert). There is also a significant interaction of gender and imagery (so, men and women respond differently to positive, negative and neutral imagery regardless of the drink being advertised). Finally, the three-way interaction between gender, imagery and drink is significant, indicating that the way in which imagery affects responses to different types of drinks depends on whether the subject is male or female. The effects of the repeated-measures variables have been outlined in Chapter 13 and the pattern of these responses will not have changed, so rather than repeat myself, I will concentrate on the new effects and the forgetful reader should look back at Chapter 13!

195

Tests of Within-Subjects Effects Measure: MEASURE_1 Type III Sum of Squares Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound 2092.344 2092.344 2092.344 2092.344 4569.011 4569.011 4569.011 4569.011 3216.867 3216.867 3216.867 3216.867 21628.678 21628.678 21628.678 21628.678 1998.344 1998.344 1998.344 1998.344 1354.533 1354.533 1354.533 1354.533 2624.422 2624.422 2624.422 2624.422 495.689 495.689 495.689 495.689 2411.000 2411.000 2411.000 2411.000 Mean Square 1046.172 1493.568 1334.881 2092.344 2284.506 3261.475 2914.954 4569.011 89.357 127.571 114.017 178.715 10814.339 11196.937 10814.339 21628.678 999.172 1034.522 999.172 1998.344 37.626 38.957 37.626 75.252 656.106 807.186 656.106 2624.422 123.922 152.458 123.922 495.689 33.486 41.197 33.486 133.944 19.593 19.593 19.593 19.593 3.701 3.701 3.701 3.701 .000 .000 .000 .000 .009 .014 .009 .070 287.417 287.417 287.417 287.417 26.555 26.555 26.555 26.555 .000 .000 .000 .000 .000 .000 .000 .000

Source DRINK

df 2 1.401 1.567 1.000 2 1.401 1.567 1.000 36 25.216 28.214 18.000 2 1.932 2.000 1.000 2 1.932 2.000 1.000 36 34.770 36.000 18.000 4 3.251 4.000 1.000 4 3.251 4.000 1.000 72 58.524 72.000 18.000

F 11.708 11.708 11.708 11.708 25.566 25.566 25.566 25.566

Sig. .000 .001 .000 .003 .000 .000 .000 .000

DRINK * GENDER

Error(DRINK)

IMAGERY

IMAGERY * GENDER

Error(IMAGERY)

DRINK * IMAGERY

DRINK * IMAGERY * GENDER

Error(DRINK*IMAGERY)

The Effect of Gender The main effect of gender is listed separately from the repeated-measures effects in a table labelled Tests of Between-Subjects Effects. Before looking at this table it is important to check the assumption of homogeneity of variance using Levenes test. SPSS produces a table listing Levenes test for each of the repeated-measures variables in the data editor, and we need to look for any variable that has a significant value. The table showing Levenes test indicates that variances are homogeneous for all levels of the repeated-measures variables (because all significance values are greater than 0.05). If any values were significant, then this would compromise the accuracy of the F-test for

196

gender, and we would have to consider transforming all of our data to stabilize the variances between groups (one popular transformation is to take the square root of all values). Fortunately, in this example a transformation is unnecessary. The second table shows the ANOVA summary table for the main effect of gender, and this reveals a significant effect (because the significance of 0.018 is less than the standard cut-off point of 0.05).
a Levene's Test of Equality of Error Variances

F Beer + Sexy Beer + Corpse Beer + Person in Armchair Wine + Sexy Wine + Corpse Wine + Person in Armchair Water + Sexy Water + Corpse Water + Person in Armchair 1.009 1.305 1.813 2.017 1.048 .071 .317 .804 1.813

df1 1 1 1 1 1 1 1 1 1

df2 18 18 18 18 18 18 18 18 18

Sig. .328 .268 .195 .173 .320 .793 .580 .382 .195

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+GENDER - Within Subjects Design: DRINK+IMAGERY+DRINK*IMAGERY

Tests of Between-Subjects Effects Measure: MEASURE_1 Transformed Variable: Average Type III Sum of Squares 1246.445 58.178 155.167 Mean Square 1 1 18 1246.445 58.178 8.620

Source Intercept GENDER Error

df

F 144.593 6.749

Sig. .000 .018

We can report that there was a significant main effect of gender (F(1, 18) = 6.75, p < 0.05). This effect tells us that if we ignore all other variables, male subjects ratings were significantly different to females. If you requested that SPSS display means for the gender effect you should scan through your output and find the table in a section headed Estimated Marginal Means. The table of means for the main effect of gender with the associated standard errors is plotted alongside. It is clear from this graph that mens ratings were generally significantly more positive than females. Therefore, men gave
197

more positive ratings than women regardless of the drink being advertised and the type of imagery used in the advert.

Estimates Measure: MEASURE_1 95% Confidence Interval Lower Upper Bound Bound 7.649 4.238 11.551 8.140

Gender Male Female

Mean 9.600 6.189

Std. Error .928 .928

The Interaction between Gender and Drink Gender interacted in some way with the type of drink used as a stimulus. Remembering that the effect of drink violated sphericity, we must report GreenhouseGeisser-corrected values for this interaction with the between-group factor. From the summary table we should report that there was a significant interaction between the type of drink used and the gender of the subject (F(1.40, 25.22) = 25.57, p < 0.001). This effect tells us that the type of drink being advertised had a different effect on men and women. We can use the estimated marginal means to determine the nature of this interaction (or we could have asked SPSS for a plot of gender drink). The means and interaction graph show the meaning of this result. The graph shows the average male ratings of each drink ignoring the type of imagery with which it was presented (circles). The womens scores are shown as squares. The graph clearly shows that male and female ratings are very similar for wine and water, but men seem to rate beer more highly than womenregardless of the type of imagery used. We could interpret this interaction as meaning that the type of drink being advertised influenced ratings differently in men and women. Specifically, ratings were similar for wine and water but males rated beer higher than women. This interaction can be clarified using the contrasts specified before the analysis.
198

2. Gender * DRINK Measure: MEASURE_1 95% Confidence Interval Lower Upper Bound Bound 15.471 5.726 -2.103 -2.062 7.726 3.197 25.729 8.940 3.836 8.196 10.940 9.136

Gender Male

DRINK 1 2 3 1 2 3

Mean 20.600 7.333 .867 3.067 9.333 6.167

Std. Error 2.441 .765 1.414 2.441 .765 1.414

Female

The Interaction between Gender and Imagery Gender interacted in some way with the type of imagery used as a stimulus. The effect of imagery did not violate sphericity, so we can report the uncorrected F-value. From the summary table we should report that there was a significant interaction between the type of imagery used and the gender of the subject (F(2, 36) = 26.55, p < 0.001). This effect tells us that the type of imagery used in the advert had a different effect on men and women. We can use the estimated marginal means to determine the nature of this interaction. The means and interaction graph shows the meaning of this result. The graph shows the average male in each imagery condition ignoring the type of drink that was rated (circles). The womens scores are shown as squares. The graph clearly shows that male and female ratings are very similar for positive and neutral imagery, but men seem to be less affected by negative imagery than womenregardless of the drink in the advert. To interpret this finding more fully, we should consult the contrasts for this interaction.

3. Gender * IMAGERY Measure: MEASURE_1 95% Confidence Interval Lower Upper Bound Bound 17.595 -1.460 4.502 19.062 -14.293 5.635 23.471 3.127 10.365 24.938 -9.707 11.498

Gender Male

IMAGERY 1 2 3 1 2 3

Mean 20.533 .833 7.433 22.000 -12.000 8.567

Std. Error 1.399 1.092 1.395 1.399 1.092 1.395

Female

199

The Interaction between Drink and Imagery The interpretation of this interaction is the same as for the two-way ANOVA (see Chapter 13). You may remember that the interaction reflected the fact that negative imagery has a different effect to both positive and neutral imagery (because it decreased ratings rather than increasing them). The Interaction between Gender, Drink and Imagery The three-way interaction tells us whether the drink by imagery interaction is the same for men and women (i.e. whether the combined effect of the type of drink and the imagery used is the same for male subjects as for female subjects). We can conclude that there is a significant three-way drink imagery gender interaction (F(4, 72) = 3.70, p < 0.01). The nature of this interaction is shown up in the graph, which shows the imagery by drink interaction for men and women separately. The male graph shows that when positive imagery is used, men generally rated all three drinks positively (the line with circles is higher than the other lines for all drinks). This pattern is true of women also (the line representing positive imagery is above the other two lines). When neutral imagery is used, men rate beer very highly, but rate wine and water fairly neutrally. Women, on the other hand rate beer and water neutrally, but rate wine more positively (in fact, the pattern of the positive and neutral imagery lines show that women generally rate wine slightly more positively than water and beer). So, for neutral imagery men still rate beer positively, and women still rate wine positively. For the negative imagery, the men still rate beer very highly, but give low ratings to the other two types of drink. So, regardless of the type of imagery used, men rate beer very positively (if you look at the graph youll
200

note that ratings for beer are virtually identical for the three types of imagery). Women, however, rate all three drinks very negatively when negative imagery is used. The threeway interaction is, therefore, likely to reflect these sex differences in the interaction between drink and imagery. Specifically, men seem fairly immune to the effects of imagery when beer is being used as a stimulus, whereas women are not. The contrasts will show up exactly what this interaction represents.
4. Gender * DRINK * IMAGERY Measure: MEASURE_1 95% Confidence Interval Lower Upper Bound Bound 16.318 15.697 11.797 18.227 -10.825 4.383 10.119 -14.424 -5.575 8.818 -15.603 -2.003 24.327 -19.225 12.683 15.919 -13.224 3.325 33.282 24.503 22.003 26.373 -4.775 10.617 18.881 -5.176 1.375 25.782 -6.797 8.203 32.473 -13.175 18.917 24.681 -3.976 10.275

Gender Male

DRINK 1

IMAGERY 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Mean 24.800 20.100 16.900 22.300 -7.800 7.500 14.500 -9.800 -2.100 17.300 -11.200 3.100 28.400 -16.200 15.800 20.300 -8.600 6.800

Std. Error 4.037 2.096 2.429 1.939 1.440 1.483 2.085 2.201 1.654 4.037 2.096 2.429 1.939 1.440 1.483 2.085 2.201 1.654

Female

Male

Female

201

Graphs showing the drink by imagery interaction for men and women. Lines represent positive imagery (circles), negative imagery (squares) and neutral imagery (triangles)

Contrasts for Repeated-Measures Variables We requested simple contrasts for the drink variable (for which water was used as the control category) and for the imagery category (for which neutral imagery was used as the control category). The table is the same as for the previous example except that the added effects of gender and its interaction with other variables are now included. So, for the main effect of drink, the first contrast compares level 1 (beer) against the base category (in this case, the last category: water). This result is significant (F(1, 18) = 15.37, p < 0.01), and the next contrast compares level 2 (wine) with the base category (water) and confirms the significant difference found when gender was not included as a variable in the analysis (F(1, 18) = 19.92, p < 0.001). For the imagery main effect, the first contrast compares level 1 (positive) to the base category (neutral) and verifies the significant effect found by the post hoc tests (F(1, 18) = 134.87, p < 0.001). The second contrast confirms the significant difference found for the negative imagery condition compared to the neutral (F(1, 18) = 129.18, p < 0.001). No contrast was specified for gender.

202

Tests of Within-Subjects Contrasts Measure: MEASURE_1 Type III Sum of Squares 1383.339 464.006 2606.806 54.450 1619.967 419.211 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 Level 1 vs. Level 3 Level 2 vs. Level 3 3520.089 3690.139 .556 975.339 469.800 514.189 320.000 720.000 36.450 2928.200 441.800 480.200 4.050 405.000 3416.200 3416.200 1545.800 1662.800 Mean Square 1 1 1 1 18 18 1 1 1 1 18 18 1 1 1 1 1 1 1 1 18 18 18 18 1383.339 464.006 2606.806 54.450 89.998 23.290 3520.089 3690.139 .556 975.339 26.100 28.566 320.000 720.000 36.450 2928.200 441.800 480.200 4.050 405.000 189.789 189.789 85.878 92.378 1.686 8.384 .223 31.698 2.328 5.592 .025 4.384 .211 .010 .642 .000 .144 .029 .877 .051 134.869 129.179 .021 34.143 .000 .000 .886 .000

Source DRINK

DRINK Level 1 vs. Level 3 Level 2 vs. Level 3

IMAGERY

df

F 15.371 19.923 28.965 2.338

Sig. .001 .000 .000 .144

DRINK * GENDER

Level 1 vs. Level 3 Level 2 vs. Level 3

Error(DRINK)

Level 1 vs. Level 3 Level 2 vs. Level 3

IMAGERY IMAGERY * GENDER Error(IMAGERY)

DRINK * IMAGERY

DRINK * IMAGERY * GENDER

Level 1 vs. Level 3 Level 2 vs. Level 3

Error(DRINK*IMAGERY)

Level 1 vs. Level 3 Level 2 vs. Level 3

Drink Gender Interaction 1: Beer vs. Water, Male vs. Female The first interaction term looks at level 1 of drink (beer) compared to level 3 (water), comparing male and female scores. This contrast is highly significant (F(1, 18) = 28.97, p < 0.001). This result tells us that the increased ratings of beer compared to water found for men are not found for women. So, in the graph the squares representing female ratings of beer and water are roughly level; however, the circle representing male ratings of beer is much higher than the circle representing water. The positive contrast represents this difference and so we can conclude that male ratings of beer (compared to water) were significantly greater than womens ratings of beer (compared to water). Drink Gender Interaction 2: Wine vs. Water, Male vs. Female The second interaction term compares level 2 of drink (wine) to level 3 (water), contrasting male and female scores. There is no significant difference for this contrast (F(1, 18) = 2.34, p = 0.14), which tells us that the difference between ratings of wine

203

compared to water in males is roughly the same as in females. Therefore, overall, the drink gender interaction has shown up a difference between males and females in how they rate beer (regardless of the type of imagery used). Imagery Gender Interaction 1: Positive vs. Neutral, Male vs. Female The first interaction term looks at level 1 of imagery (positive) compared to level 3 (neutral), comparing male and female scores. This contrast is not significant (F < 1). This result tells us that ratings of drinks presented with positive imagery (relative to those presented with neutral imagery) were equivalent for males and females. This finding represents the fact that in the earlier graph of this interaction the squares and circles for both the positive and neutral conditions overlap (therefore male and female responses were the same). Imagery Gender Interaction 2: Negative vs. Neutral, Male vs. Female The second interaction term looks at level 2 of imagery (negative) compared to level 3 (neutral), comparing male and female scores. This contrast is highly significant (F(1, 18) = 34.13, p < 0.001). This result tells us that the difference between ratings of drinks paired with negative imagery compared to neutral was different for men and women. Looking at the earlier graph of this interaction this finding represents the fact that for men, ratings of drinks paired with negative imagery were relatively similar to ratings of drinks paired with neutral imagery (the circles have a fairly similar vertical position). However, if you look at the female ratings, then drinks were rated much less favourably when presented with negative imagery than when presented with neutral imagery (the square in the negative condition is much lower than the neutral condition). Therefore,

204

overall, the imagery gender interaction has shown up a difference between males and females in terms of their ratings to drinks presented with negative imagery compared to neutral; specifically, men seem less affected by negative imagery. Drink Imagery Gender Interaction 1: Beer vs. Water, Positive vs. Neutral Imagery, Male vs. Female The first interaction term compares level 1 of drink (beer) to level 3 (water), when positive imagery (level 1) is used compared to neutral (level 3) in males compared to females (F(1, 18) = 2.33, p = 0.144). The non-significance of this contrast tells us that the difference in ratings when positive imagery is used compared to neutral imagery is roughly equal when beer is used as a stimulus as when water is used, and these differences are equivalent in male and female subjects. In terms of the interaction graph it means that the distance between the circle and the triangle in the beer condition is the same as the distance between the circle and the triangle in the water condition and that these distances are equivalent in men and women. Drink Imagery Gender Interaction 2: Beer vs. Water, Negative vs. Neutral Imagery, Male vs. Female The second interaction term looks at level 1 of drink (beer) compared to level 3 (water), when negative imagery (level 2) is used compared to neutral (level 3). This contrast is significant (F(1, 18) = 5.59, p < 0.05). This
30 20 10 0 -10 -20 Negative Neutral Negative Neutral

Beer

Water

result tells us that the difference in ratings between beer and water when negative imagery is used (compared to neutral

205

imagery) is different between men and women. If we plot ratings of beer and water across the negative and neutral conditions, for males (circles) and females (squares) separately, we see that ratings after negative imagery are always lower than ratings for neutral imagery except for mens ratings of beer, which are actually higher after negative imagery. As such, this contrast tells us that the interaction effect reflects a difference in the way in which males rate beer compared to females when negative imagery is used compared to neutral. Males and females are similar in their pattern of ratings for water but different in the way in which they rate beer. Drink Imagery Gender Interaction 3: Wine vs. Water, Positive vs. Neutral Imagery, Male vs. Female The third interaction term looks at level 2 of drink (wine) compared to level 3 (water), when positive imagery (level 1) is used compared to neutral (level 3) in males compared to females. This contrast is non-significant (F(1, 18) < 1). This result tells us that the difference in ratings when positive imagery is used compared to neutral imagery is roughly equal when wine is used as a stimulus as when water is used, and these differences are equivalent in male and female subjects. In terms of the interaction graph it means that the distance between the circle and the triangle in the wine condition is the same as the distance between the circle and the triangle in the water condition and that these distances are equivalent in men and women. Drink Imagery Gender Interaction 4: Wine vs. Water, Negative vs. Neutral Imagery, Male vs. Female

206

The final interaction term looks at level 2 of drink (wine) compared to level 3 (water), when negative imagery (level 2) is used
30 20 10 0 -10 -20 Negative Neutral Negative Neutral

compared to neutral (level 3). This contrast


Wine

is very close to significance (F(1, 18) = 4.38, p = 0.051). This result tells us that the difference in ratings between wine and water when negative imagery is used

(compared to neutral imagery) is different between men and women (although this difference has not quite reached significance). If we plot ratings of wine and water across the negative and neutral conditions, for males (circles) and females (squares), we see that ratings after negative imagery are always lower than ratings for neutral imagery, but for women rating wine the change is much more dramatic (the line is steeper). As such, this contrast tells us that the interaction effect reflects a difference in the way in which females rate wine differently to males when neutral imagery is used compared to when negative imagery is used. Males and females are similar in their pattern of ratings for water but different in the way in which they rate wine. It is noteworthy that this contrast was not significant using the usual 0.05 level; however, it is worth remembering that this cut-off point was set in a fairly arbitrary way, and so it is worth reporting these close effects and letting your reader decide whether they are meaningful or not. There is also a growing trend towards reporting effect sizes in preference to using significance levels. Summary These contrasts again tell us nothing about the differences between the beer and wine conditions (or the positive and negative conditions) and different contrasts would have to

207

be run to find out more. However, what is clear so far is that differences exist between men and women in terms of their ratings towards beer and wine. It seems as though men are relatively unaffected by negative imagery when it comes to beer. Likewise, women seem more willing to rate wine positively when neutral imagery is used than men do. What should be clear from this is that complex ANOVA in which several independent variables are used results in complex interaction effects that require a great deal of concentration to interpret (imagine interpreting a four-way interaction!). Therefore, it is essential to take a systematic approach to interpretation and plotting graphs is a particularly useful way to proceed. It is also advisable to think carefully about the appropriate contrasts to use to answer the questions you have about your data. It is these contrasts that will help you to interpret interactions, so make sure you select sensible ones!
Task 2

Text messaging is very popular among mobile phone owners, to the point that books have been published on how to write in text speak (BTW, hope u kno wat I mean by txt spk). One concern is that children may use this form of communication so much that it will hinder their ability to learn correct written English. One concerned researcher conducted an experiment in which one group of children was encouraged to send text messages on their mobile phones over a six-month period. A second group was forbidden from sending text messages for the same period. To ensure that kids in this latter group didnt use their phones, this group was given armbands that administered painful shocks in the presence of

208

microwaves (like those emitted from phones).3 There were 50 different participants: 25 were encouraged to send text messages and 25 were forbidden. The outcome was a score on a grammatical test (as a percentage) that was measured both before and after the experiment. The first independent variable was, therefore, text message use (text messagers versus controls) and the second independent variable was the time at which grammatical ability was assessed (before or after the experiment). The data are in the file TextMessages.sav.

The line chart (with error bars) shows the grammar data. The circles show the mean grammar score before and after the experiment for the text message group and the controls. The means before and after are connected by a line for the two groups separately. Its clear from this chart that in the text message group grammar scores went down dramatically over the six month period in which they used their mobile phone. For the controls, their grammar scores also fell but much less dramatically.

80 75 70 Text Messagers Controls

Mean Grammar Score

65 60 55 50 45 40 10 0 Before After

Time

209

Line chart (with error bars showing the standard error of the mean) of the mean grammar scores before and after the experiment for text messagers and controls
Descriptive Statistics Group Text Messagers Controls Total Text Messagers Controls Total Mean 64.8400 65.6000 65.2200 52.9600 61.8400 57.4000 Std. Deviation 10.67973 10.83590 10.65467 16.33116 9.41046 13.93278 N 25 25 50 25 25 50

Grammer at Time 1

Grammar at Time 2

The output above shows the table of descriptive statistics from the two-way mixed ANOVA; the table has means at time 1 split according to whether the people were in the text messaging group or the control group, then below we have the means for the two groups at time 2. These means correspond to those plotted above.
b Mauchly's Test of Sphericity

Measure: MEASURE_1 Epsilon Within Subjects Effect TIME Mauchly's W 1.000 Approx. Chi-Square .000 df 0 Sig. . GreenhouseGeisser 1.000
a

Huynh-Feldt 1.000

Lower-bound 1.000

Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. b. Design: Intercept+GROUP Within Subjects Design: TIME

a Levene's Test of Equality of Error Variances

F Grammer at Time 1 Grammar at Time 2 .089 3.458

df1 1 1

df2 48 48

Sig. .767 .069

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+GROUP Within Subjects Design: TIME

We know that when we use repeated measures we have to check the assumption of sphericity. We also know that for independent designs we need to check the homogeneity of variance assumption. If the design is a mixed design then we have both repeated and independent measures, so we have to check both assumptions. In this case, we have only

210

two levels of the repeated measure so the assumption of sphericity does not apply in this case. Levenes test produces a different test for each level of the repeated-measures variable. In mixed designs, the homogeneity assumption has to hold for every level of the repeated-measures variable. At both levels of time, Levenes test is non-significant (p = 0.77 before the experiment and p = 0.069 after the experiment). This means the assumption has not been broken at all (but it was quite close to being a problem after the experiment). _ _ The output above shows the main ANOVA summary tables. Like any two-way ANOVA, we still have three effects to find: two main effects (one for
70

each independent variable) and one interaction term. The main


Mean Grammar Score (%)
60

effect of time is significant so we can conclude that grammar scores were significantly affected by the time at which they were measured. The exact nature of this effect is easily

50

40 10 0 Before After

Group

determined because there were only two points in time (and so this main effect is comparing only two means). The graph shows that grammar scores were higher before the experiment than after. So, before the experimental manipulation scores were higher than after, meaning that the manipulation had the net effect of significantly reducing grammar scores. This main effect seems rather interesting until you consider that these means include both text messagers and controls. There are three possible reasons for the drop in grammar scores: (1) the text messagers got worse and are dragging down the mean after the experiment, (2) the controls somehow got worse, or (3) the whole group
211

just got worse and it had nothing to do with whether the children text messaged or not. Until we examine the interaction, we wont see which of these is true. The main effect of group is shown by the F-ratio in the second table above. The probability associated with this Fratio is 0.09, which is just above the critical value of 0.05. Therefore, we must conclude that there was no significant main effect on grammar scores of whether children textmessaged or not. Again, this effect seems interesting enough
Mean Grammar Score (%)
70

60

50

40 10 0 Text Messagers Controls

Group

and mobile phone companies might certainly chose to cite it as evidence that text messaging does not affect your grammar ability. However, remember that this main effect ignores the time at which grammar ability is measured. It just means that if we took the average grammar score for text messagers (thats including their score both before and after they started using their phone), and compared this to the mean of the controls (again including scores before and after) then these means would not be significantly different. The graph shows that when you ignore the time at which grammar was measured, the controls have slightly better grammar than the text messagersbut not significantly so. Main effects are not always that interesting and should certainly be viewed in the context of any interaction effects. The interaction effect in this example is shown by the F-ratio in the row labeled Time*Group, and because the probability of obtaining a value this big by chance is 0.047, which is just less than the criterion of 0.05, we can say that there is a significant interaction between the time at which grammar was measured and whether or not children were allowed to text message within that time. The mean ratings in all

212

conditions help us to interpret this effect. The significant interaction tells us that the change in grammar scores was significantly different in text messagers compared to controls. Looking at the interaction graph, we can see that although grammar scores fell in controls, the drop was much more marked in the text messagers; so, text messaging does seem to ruin your ability at grammar compared to controls4. Writing the Result We can report the three effects from this analysis as follows: The results show that the grammar ratings at the end of the experiment were significantly lower than those at the beginning of the experiment, F(1, 48) = 15.46, p < .001, r = .61. The main effect of group on the grammar scores was non-significant, F(1, 48) = 2.99, ns, r = .27. This indicated that when the time at which grammar was measured is ignored, the grammar ability in the text message group was not significantly different to the controls. The time group interaction was significant, F(1, 48) = 4.17, p < .05, r = .34, indicating that the change in grammar ability in the text message group was significantly different to the change in the control groups. These findings indicate that although there was a natural decay of grammatical ability over time (as shown by the controls) there was a much stronger effect when participants were

Its interesting that the control group means dropped too. This could be because the control group were undisciplined and still used their mobile phones, or it could just be that the education system in this country is so underfunded that there is no one to teach English anymore! 213

encouraged to use text messages. This shows that using text messages accelerates the inevitable decline in grammatical ability.
Task 3

A researcher was interested in the effects on peoples mental health of participating in Big Brother (see Chapter 1 if you dont know what Big Brother is). The researcher hypothesized that that they start off with personality disorders that are exacerbated by being forced to live with people as attention seeking as themselves. To test this hypothesis, she gave eight contestants a questionnaire measuring personality disorders before they entered the house, and again when they left the house. A second group of eight people acted as a waiting list control. These were people short listed to go into the house, but never actually made it. They too were given the questionnaire at the same points in time as the contestants. The data are in BigBrother.sav. Conduct a mixed ANOVA on the data.

Running the analysis

214

SPSS output

The line chart (with error bars) shows the grammar data. The circles show the mean grammar score before and after the experiment for the text message group and the controls. The means before and after are connected by a line for the two groups separately. Its clear from this chart that in the text message group grammar scores went down dramatically over the six month period in which they used their mobile phone. For the controls, their grammar scores also fell but much less dramatically.

215

Error bar chart of the mean personality disorder score before entering and after leaving the big brother house

216

The output above shows the table of descriptive statistics from the two-way mixed ANOVA; the table has mean borderline personality disorder (BPD) scores before entering the big brother (BB) house split according to whether the people were a contestant or not, then below we have the means for the two groups after leaving the house. These means correspond to those plotted above.

We know that when we use repeated-measures we have to check the assumption of sphericity. However, we also know that for sphericity to be an issue we need at least three conditions. We have only two conditions here so sphericity does not need to be tested (and, therefore, SPSS produces a blank in the column labeled Sig.). We also need to check the homogeneity of variance assumption. Levenes test produces a different test for each level of the repeated-measures variable. In mixed designs, the homogeneity assumption has to hold for every level of the repeated-measures variable. At both levels of time, Levenes test is non-significant (p = 0.061 before entering the BB house and p = .088 after leaving). This means the assumption has not been significantly broken (but it was quite close to being a problem).
217

The output above shows the main ANOVA summary tables. Like any two-way ANOVA, we still have three effects to find: two main effects (one for each independent variable) and one interaction term. The main effect of time is not significant so we can conclude that BPD scores were significantly affected by the time at which they were measured. The exact nature of this effect is easily determined because there were only two points in time (and so this main effect is comparing only two means). The graph shows that BPD scores were not significantly different after leaving the BB house compared to before entering it.

218

The main effect of group (bb) is shown by the F-ratio in the second table above. The probability associated with this F-ratio is .43, which is above the critical value of .05. Therefore, we must conclude that there was no significant main effect on BPD scores of whether the person was a BB contestant or not. The graph shows that when you ignore the time at which BPD was measured, the contestants and controls are not significantly different.

219

The interaction effect in this example is shown by the F-ratio in the row labelled
Time*bb, and because the probability of obtaining a value this big is .018, which is less

than the criterion of .05, we can say that there is a significant interaction between the time at which BPD was measured and whether or not the person was a contestant or not. The mean ratings in all conditions (and on the interaction graph) help us to interpret this effect. The significant interaction seems to indicate that for controls BPD scores went down (slightly) from before entering the house to after leaving it but for contestants these opposite is true: BPD scores increased over time.
Writing the results

220

We can report the three effects from this analysis as follows: The main effect of group was not significant, F(1, 14) = 0.67, p = .43, indicating that across both time points borderline personality disorder scores were similar in BB contestants and controls. The main effect of time was not significant, F(1, 14) = 0.09, p = .77, indicating that across all participants borderline personality disorder scores were similar before entering the house and after leaving it. The time group interaction was significant, F(1, 14) = 7.15, p < .05, indicating that although borderline personality disorder scores decreased for controls from before entering the house to after leaving it, scores increased for the contestants.

Chapter 15 Task 1

A psychologist was interested in the cross-species differences between men and dogs. She observed a group of dogs and a group of men in a naturalistic setting (20 of each). She classified several behaviours as being dog-like (urinating against tress and lamp posts, attempts to copulate with anything that moved, and attempts to lick their own genitals). For each man and dog she counted the number of doglike behaviours displayed in a 24 hour period. It was hypothesized that dogs would display more dog-like behaviours than men. The data are in the file
MenLikeDogs.sav. Analyse them with a MannWhitney test.

221

SPSS Output
Ranks Dog-Like Behaviour Species Dog Man Total N 20 20 40 Mean Rank 20.77 20.23 Sum of Ranks 415.50 404.50

Test Statisticsb Dog-Like Behaviour 194.500 404.500 -.150 .881 .883a

Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)]

a. Not corrected for ties. b. Grouping Variable: Species

Calculating an Effect Size The output tells us that z is .15, and we had 20 men and 20 dogs so the total number of observations was 40. The effect size is, therefore:
r= 0.15

40 = 0.02

This represents a tiny effect (it is close to zero), which tells us that there truly isnt much difference between dogs and men. Writing and Interpreting the Result We could report something like: Men (Mdn = 27) did not seem to differ from dogs (Mdn = 24) in the amount of dog-like behaviour they displayed (U = 194.5, ns).

222

Note that Ive reported the median for each condition. Of course, we really ought to include the effect size as well. We could do two things. The first is to report the z-score associated with the test statistic. This value would enable the reader to determine both the exact significance of the test, and to calculate the effect size r: Men (Mdn = 27) and dogs (Mdn = 24) did not significantly differ in the extent to which they displayed dog-like behaviours (U=194.5, ns, z = .15). The alternative is to just report the effect size (because readers can convert back to the zscore if they need to for any reason). This approach is better because the effect size will probably be most useful to the reader. Men (Mdn = 27) and dogs (Mdn = 24) did not significantly differ in the extent to which they displayed dog-like behaviours (U=194.5, ns, r =.02). Task 2 Theres been much speculation over the years about the influence of subliminal messages on records. To name a few cases, both Ozzy Osbourne and Judas Priest have been accused of putting backward masked messages on their albums that subliminally influence poor unsuspecting teenagers into doing things like blowing their heads off with shotguns. A psychologist was interested in whether backward masked messages really did have an effect. He took the master tapes of Britney Spears Baby one more time and created a second version that had the masked message deliver your soul to the dark lord repeated in the chorus. He took this version, and the original, and played one version (randomly) to a group of 32 people. He took the same group of people six months later and played them

223

whatever version they hadnt heard the time before. Thus each person heard both the original and the version with the masked message, but at different points in time. The psychologist measured the number of goats that were sacrificed in the week after listening to each version. It was hypothesized that the backward message would lead to more goats being sacrificed. The data are in the file
DarkLord.sav. Analyse them with a Wilcoxon signed-rank test.

Ranks N No Message - Message Negative Ranks Positive Ranks Ties Total 11a 17b 4c 32 Mean Rank 10.14 17.32 Sum of Ranks 111.50 294.50

a. No Message < Message b. No Message > Message c. Message = No Message

Test Statisticsb No Message - Message -2.094a .036

Z Asymp. Sig. (2-tailed)

a. Based on negative ranks. b. Wilcoxon Signed Ranks Test

Calculating an Effect Size The output tells us that z is 2.094, and we had 64 observations (although we only used 32 people and tested them twice, it is the number of observations, not the number of people, that is important here). The effect size is, therefore:
r= 2.094

64 = 0.26

224

This represents a medium effect (it is close to Cohens benchmark of 0.3), which tells us that the effect of whether or a subliminal message was present was a substantive effect. Writing and Interpreting the Result We could report something like: The number of goats sacrificed after hearing the message (Mdn = 9) was significantly less than after hearing the normal version of the song (Mdn = 11) (T = 111.50, p < .05). As with the MannWhitney test, we should report either the z-score or the effect size. The effect size is most useful: The number of goats sacrificed after hearing the message (Md = 9) was significantly less than after hearing the normal version of the song (Mdn = 11) (T = 111.50, p < .05, r = .26).
Task 3

A psychologist was interested in the effects of television programmes on domestic life. She hypothesized that through learning by watching, certain programmes might actually encourage people to behave like the characters within them. This in turn could affect the viewers own relationships (depending on whether the programme depicted harmonious or dysfunctional relationships). She took episodes of three popular TV shows and showed them to 54 couples, after which the couple were left alone in the room for an hour. The experimenter measured the number of times the couple argued. Each couple viewed all three of the TV programmes at different points in time (a week apart) and the order in which the
225

programmes were viewed was counterbalanced over couples. The TV programmes selected were Eastenders (which typically portrays the lives of extremely miserable, argumentative, London folk who like nothing more than to beat each others up, lie to each other, sleep with each others wives and generally show no evidence of any consideration to their fellow humans!), Friends (which portrays a group of unrealistically considerate and nice people who love each other oh so very muchbut for some reason I love it anyway!), and a National Geographic programme about whales (this was supposed to act as a control). The data are in the file Eastenders.sav. access them and conduct Friedmans ANOVA on the data.
Ranks Mean Rank 2.29 1.81 1.91

Eastenders Friends National Geographic

The first table shows the mean rank in each condition. These mean ranks are important later for interpreting any effects; they show that the ranks were highest after watching Eastenders.
Test Statisticsa N Chi-Square df Asymp. Sig. 54 7.586 2 .023

a. Friedman Test

The next table shows the chi-square test statistic and its associated degrees of freedom (in this case we had three groups so the degrees of freedom are 31, or 2), and the
226

significance. Therefore, we could conclude that the type of programme watched significantly affected the subsequent number of arguments (because the significance value is less than 0.05). However, this result doesnt tell us exactly where the differences lie. A nice succinct set of comparisons would be to compare each group against the control: Test 1: Eastenders compared to control Test 2: Friends compared to control This gives rise to only two tests, so rather than use 0.05 as our critical level of significance, wed use 0.05/2 = 0.025.
Ranks N National Geographic - Eastenders Negative Ranks Positive Ranks Ties Total Negative Ranks Positive Ranks Ties Total 31a 18b 5c 54 21d 24e 9f 54 Mean Rank 28.85 18.36 Sum of Ranks 894.50 330.50

National Geographic - Friends

22.00 23.88

462.00 573.00

a. National Geographic < Eastenders b. National Geographic > Eastenders c. Eastenders = National Geographic d. National Geographic < Friends e. National Geographic > Friends f. Friends = National Geographic

Test Statisticsc National National Geographic - Geographic Eastenders Friends -2.813a -.629b .005 .530

Z Asymp. Sig. (2-tailed)

a. Based on positive ranks. b. Based on negative ranks. c. Wilcoxon Signed Ranks Test

227

The next tables show the test statistics from doing Wilcoxon tests on the two comparisons that I suggested. Remember that we are now using a critical value of 0.025, so we compare the significance of both test statistics against this critical value. The test comparing Eastenders to the National Geographic programme about whales has a significance value of 0.005, which is well below our criterion of 0.025, therefore we can conclude that Eastenders led to significantly more arguments than the programme about whales. The second comparison compares the number of arguments after Friends with the number after the programme about whales. This contrast is non-significant (the significance of the test statistic is 0.530, which is bigger than our critical value of 0.025), so we can conclude that there was no difference in the number of arguments after watching Friends compared to after watching the whales. The effect we got seems to mainly reflect the fact that Eastenders makes people argue more. Calculating an Effect Size We can calculate effect sizes for the Wilcoxon tests that we used to follow up the main analysis. For the first comparison (Eastenders vs. control) z is 2.813, and because this is based on comparing two groups each containing 54 observations, we have 108 observations in total (remember that it isnt important that the observations come from the same people). The effect size is, therefore:
rEastenders Control = 2.813

108 = 0.27

228

This represents a medium effect (it is close to Cohens benchmark of 0.3), which tells us that the effect of Eastenders relative to the control was a substantive effect: Eastenders produced substantially more arguments. For the second comparison (Friends vs. control) z is 0.629, and this was again based on 108 observations. The effect size is, therefore:

rStory Control =

0.629

108 = 0.06

This represents virtually no effect (it is close to zero). Therefore, Friends had very little effect in creating arguments compared to the control. Writing and Interpreting the Result For Friedmans ANOVA we need only report the test statistic (which we saw earlier is denoted by 2), its degrees of freedom and its significance. So, we could report something like: The number of arguments that couples had was significantly affected by the programme they had just watched (2(2) = 7.59, p < .05). We need to report the follow-up tests as well (including their effect sizes): The number of arguments that couples had was significantly affected by the programme they had just watched (2(2) = 7.59, p < .05). Wilcoxon tests were used to follow up this finding. A Bonferroni correction was applied and so all effects are reported at a .025 level of significance. It appeared that watching Eastenders significantly affected the number of arguments compared to the
229

programme about whales (T = 330.50, r = .27). However, the number of arguments was not significantly different after Friends compared to after the programme about whales (T = 462, ns, r = .06). We can conclude that watching Eastenders did produce significantly more arguments compared to watching a programme about whales, and this effect was medium in size. However, Friends didnt produce any substantial reduction in the number of arguments relative to the control programme.
Task 4

A researcher was interested in trying to prevent coulrophobia (fear of clowns) in children. She decided to do an experiment in which different groups of children (15 in each) were exposed to different forms of positive information about clowns. The first group watched some adverts for McDonalds in which is mascot Ronald McDonald is seen cavorting about with children going on about how they should love their mums. A second group was told a story about a clown who helped some children when they got lost in a forest (although what on earth a clown was doing in a forest remains a mystery). A third group was entertained by a real clown, who came into the classroom and made balloon animals for the children. A final group acted as a control condition and they had nothing done to them at all. The researcher took self-report ratings of how much the children liked clowns (rather like the fear-beliefs questionnaire in Chapter 2) resulting in a score for each child that could range from 0 (not scared of clowns at all) to 5 (very scared of clowns). The data are in the file coulrophobia.sav. Access the data and conduct a Kruskal-Wallis test.

230

Ranks Format of Information Advert Story Exposure None Total N 15 15 15 15 60 Mean Rank 45.03 21.87 23.77 31.33

Fear beliefs

This table tells us the mean rank in each condition. These mean ranks are important later for interpreting any effects.
Test Statisticsa,b Fear beliefs 17.058 3 .001

Chi-Square df Asymp. Sig.

a. Kruskal Wallis Test b. Grouping Variable: Format of Information

This table shows this test statistic (SPSS labels it chi-square rather than H) and its associated degrees of freedom (in this case we had four groups so the degrees of freedom are 4-1, or 3), and the significance (which is less than the critical value of 0.05). Therefore, we could conclude that the type of information presented to the children about clowns significantly affected their fear ratings of clowns. A nice succinct set of comparisons would be to compare each group against the control: 1. Test 1: Advert compared to control 2. Test 2: Story compared to control 3. Test 3: Exposure compared to control

231

This results in three tests, so rather than use 0.05 as our critical level of significance, wed use 0.05/3 = 0.0167. The following tables show the test statistics from doing Mann Whitney tests on the three focused comparisons that I suggested: Advert vs. control:
Test Statisticsb Fear beliefs 37.500 157.500 -3.261 .001 .001a

Story vs. control:


Test Statisticsb Fear beliefs 65.000 185.000 -2.091 .037 .050a

Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] a. Not corrected for ties.

Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] a. Not corrected for ties.

b. Grouping Variable: Format of Information

b. Grouping Variable: Format of Information

Exposure vs. control:


Test Statisticsb Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)] a. Not corrected for ties. b. Grouping Variable: Format of Information Fear beliefs 72.500 192.500 -1.743 .081 .098a

Remember that we are now using a critical value of 0.0167, so the only comparison that is significant is when comparing the advert to the control group (because the observed significance value of 0.001 is less than 0.0167). The other two comparisons produce significance values that are greater than 0.0167 so wed have to say theyre nonsignificant. So the effect we got seems to mainly reflect the fact that McDonalds adverts significantly increased fear beliefs about clowns relative to controls (which is no surprise given what a creepy weirdo Ronald McDonald is!). Calculating an Effect Size
232

We can calculate effect sizes for the MannWhitney tests that we used to follow up the main analysis. For the first comparison (adverts vs. control) z is 3.261, and because this is based on comparing two groups each containing 15 observations, we have 30 observations in total. The effect size is, therefore:
rAdvertControl = 3.261 30 = 0.60

This represents a large effect, which tells us that the effect of adverts relative to the control was a substantive effect. For the second comparison (story vs. control) z is 2.091, and this was again based on 30 observations. The effect size is, therefore:
rStory Control = 2.091

30 = 0.38

This represents a medium to large effect. Therefore, although non-significant the effect of stories relative to the control was a substantive effect. For the final comparison (exposure vs. control) z is 1.743, and this was again based on 30 observations. The effect size is, therefore:
rExposure Control = 1.743

30 = 0.32

This represents a medium effect. Therefore, although non-significant, the effect of exposure relative to the control was a substantive effect. Writing and Interpreting the Result

233

For the KruskalWallis test, we need only report the test statistic (which we saw earlier is denoted by H), its degrees of freedom and its significance. So, we could report something like: Childrens fear beliefs about clowns was significantly affected the format of information given to them (H(3) = 17.06, p < .01). However, we need to report the follow-up tests as well (including their effect sizes): Childrens fear beliefs about clowns was significantly affected the format of information given to them (H(3) = 17.06, p < .01). MannWhitney tests were used to follow up this finding. A Bonferroni correction was applied and so all effects are reported at a .0167 level of significance. It appeared that fear beliefs were significantly higher after the adverts compared to the control (U = 37.50, r = .60). However, fear beliefs were not significantly different after the stories (U = 65.00, ns, r = .38) or exposure (U = 72.5, ns, r = .32) relative to the control. We can conclude that clown information through stories and exposure did produce medium-size effects in reducing fear beliefs about clowns, but not significantly so (future work with larger samples might be appropriate), but that Ronald McDonald was sufficient to significantly increase fear beliefs about clowns.
Chapter 16 Task 1

A clinical psychologist noticed that several of his manic psychotic patients did chicken impersonations in public. He wondered whether this behaviour could be used to diagnose this disorder and so decided to compare his patients against a
234

normal sample. He observed 10 of his patients as they went through a normal day. He also needed to observe 10 of the most normal people he could find: naturally he chose to observe lecturers at the University of Sussex. He observed all participants using two dependent variables: first, how many chicken impersonations they did in the streets of Brighton over the course of a day, and, second, how good their impersonations were (as scored out of 10 by an independent farmyard noise expert). The data are in the file chicken.sav. Use MANOVA and DFA to find out whether these variables could be used to distinguish manic psychotic patients from those without the disorder.

This output shows an initial table of descriptive statistics that is produced by clicking on the descriptive statistics option in the options dialog box. This table contains the overall and group means and standard deviations for each dependent variable in turn. It seems that manic psychotics and Sussex lecturers do pretty similar amounts of chicken impersonations (lecturers do slightly less actually, but they are of a higher quality).
Descriptive Statistics QUALITY GROUP Manic Psychosis Sussex Lecturers Total Manic Psychosis Sussex Lecturers Total Mean 6.7000 7.6000 7.1500 12.1000 10.7000 11.4000 Std. Deviation 1.05935 2.98887 2.23077 4.22821 4.37290 4.24760 N 10 10 20 10 10 20

QUANTITY

The next output shows Boxs test of the assumption of equality of covariance matrices. This statistic tests the null hypothesis that the variancecovariance matrices are the same in all three groups. Therefore, if the matrices are equal (and therefore the assumption of homogeneity is met) this statistic should be non-significant. For these data p = 0.000
235

(which is less than 0.05); hence, the covariance matrices are not equal and the assumption is broken. However, because group sizes are equal we can ignore this test because Pillais trace should be robust to this violation (fingers crossed!).
a Box's Test of Equality of Covariance Matrices

Box's M 20.926 F 6.135 df1 3 df2 58320.000 Sig. .000 Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups. a. Design: Intercept+GROUP

The next table shows the main table of results. For our purposes, the group effects are of interest because they tell us whether or not the manic psychotics and Sussex lecturers differ along the two dimensions of quality and quantity of chicken impersonations. The column of real interest is the one containing the significance values of these F-ratios. For these data, all test statistics are significant with p = 0.032 (which is less than 0.05). From this result we should probably conclude that the groups do indeed differ in terms of the quality and quantity of their chicken impersonations; however, this effect needs to be broken down to find out exactly whats going on.
b Multivariate Tests

Effect Intercept

GROUP

Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root

Value .919 .081 11.318 11.318 .333 .667 .500 .500

F Hypothesis df 96.201a 2.000 96.201a 2.000 96.201a 2.000 96.201a 2.000 4.250a 2.000 4.250a 2.000 4.250a 2.000 4.250a 2.000

Error df 17.000 17.000 17.000 17.000 17.000 17.000 17.000 17.000

Sig. .000 .000 .000 .000 .032 .032 .032 .032

a. Exact statistic b. Design: Intercept+GROUP

The next table shows a summary table of Levenes test of equality of variances for each of the dependent variables. These tests are the same as would be found if a one-way
236

ANOVA had been conducted on each dependent variable in turn. Levenes test should be non-significant for all dependent variables if the assumption of homogeneity of variance has been met. The results for these data clearly show that the assumption has been met for the quantity of chicken impersonations but has been broken for the quality of impersonations. This should dent our confidence in reliability of the univariate tests to follow.
a Levene's Test of Equality of Error Variances

QUALITY QUANTITY

F 11.135 .256

df1 1 1

df2 18 18

Sig. .004 .619

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+GROUP

The next part of the output contains the ANOVA summary table for the dependent variables. The row of interest is that labelled GROUP (youll notice that the values in this row are the same as for the row labelled Corrected Model: this is because the model fitted to the data contains only one independent variable: group). The row labelled GROUP contains an ANOVA summary table for quality and quantity of chicken impersonations respectively. The values of p indicate that there was a non-significant difference between student groups in terms of both (both ps are greater than 0.05). The multivariate test statistics led us to conclude that the student groups did differ significantly across the types of psychology yet the univariate results contradict this!

237

Tests of Between-Subjects Effects Source Corrected Model Intercept GROUP Error Total Corrected Total Dependent Variable QUALITY QUANTITY QUALITY QUANTITY QUALITY QUANTITY QUALITY QUANTITY QUALITY QUANTITY QUALITY QUANTITY Type III Sum of Squares 4.050a 9.800b 1022.450 2599.200 4.050 9.800 90.500 333.000 1117.000 2942.000 94.550 342.800 df 1 1 1 1 1 1 18 18 20 20 19 19 Mean Square 4.050 9.800 1022.450 2599.200 4.050 9.800 5.028 18.500 F .806 .530 203.360 140.497 .806 .530 Sig. .381 .476 .000 .000 .381 .476

a. R Squared = .043 (Adjusted R Squared = -.010) b. R Squared = .029 (Adjusted R Squared = -.025)

We dont need to look at contrasts because the univariate tests were non-significant (and in any case there were only two groups and so no further comparisons would be necessary), and instead, to see how the dependent variables interact, we need to carry out a discriminant function analysis DFA.
Wilks' Lambda Test of Function(s) 1 Wilks' Lambda .667 Chi-square 6.893 df 2 Sig. .032

The initial statistics from the DFA tells us that there was only one variate (because there are only two groups) and this variate is significant. Therefore, the group differences shown by the MANOVA can be explained in terms of one underlying dimension.
Standardized Canonical Discriminant Function Coefficients Function 1 1.859 -1.829

QUALITY QUANTITY

The standardized discriminant function coefficients tell us the relative contribution of each variable to the variates. Both quality and quantity of impersonations have similarsized coefficients indicating that they have equally strong influence in discriminating the

238

groups. However, they have the opposite sign, which suggests that that group differences are explained by the difference between the quality and quantity of impersonations.
Functions at Group Centroids Function 1 -.671 .671

GROUP Manic Psychosis Sussex Lecturers

Unstandardized canonical discriminant functions evaluated at group means

The variate centroids for each group confirms that variate 1 discriminates the two groups because the manic psychotics have a negative coefficient and the Sussex lecturers have a positive one. There wont be a combined-groups plot because there is only one variate. Overall we could conclude that manic psychotics are distinguished from Sussex lecturers in terms of the difference between the pattern of results for quantity of impersonations compared to quality of them. If we look at the means we can see that manic psychotics produce slightly more impersonations than Sussex lecturers (but remember from the nonsignificant univariate tests that this isnt sufficient, alone, to differentiate the groups), but the lecturers produce impersonations of a higher quality (but again remember that quality alone is not enough to differentiate the groups). Therefore, although the manic psychotics and Sussex lecturers produce similar numbers of impersonations of similar quality (see univariate tests) if we combine the quality and quantity we can differentiate the groups.
Task 2

I was interested in whether students knowledge of different aspects of psychology improved throughout their degree. I took a sample of first years, second years and third years and gave them five tests (scored out of 15) representing different aspects of psychology: Exper (experimental psychology
239

such as cognitive and neuropsychology etc.); Stats (statistics); Social (social psychology); Develop (developmental psychology); Person (personality). Your task is to: (1) carry out an appropriate general analysis to determine whether there are overall group differences along these five measures; (2) look at the scale-byscale analyses of group differences produced in the output and interpret the results accordingly; (3) select contrasts that test the hypothesis that second and third years will score higher than first years on all scales; (4) select tests that compare all groups to each otherbriefly compare these results with the contrasts; and (5) carry out a separate analysis in which you test whether a combination of the measures can successfully discriminate the groups (comment only briefly on this analysis). Include only those scales that revealed group differences for the contrasts. How do the results help you to explain the findings of your initial analysis? The data are in the file psychology.sav.

This output shows an initial table of descriptive statistics that is produced by clicking on the descriptive statistics option in the options dialog box. This table contains the overall and group means and standard deviations for each dependent variable in turn.

240

Descriptive Statistics Std. Deviation 2.1574 1.5916 2.1213 2.0062 3.5599 2.3866 3.0988 3.1211 2.7303 2.8040 1.6408 2.5236 3.3248 1.9990 2.3993 2.6745 2.6458 1.7078 3.0319 2.5908

Experimental Psychology

Statistics

Social Psychology

Personality

Developmental

Gorup 1st Year 2nd Year 3rd Year Total 1st Year 2nd Year 3rd Year Total 1st Year 2nd Year 3rd Year Total 1st Year 2nd Year 3rd Year Total 1st Year 2nd Year 3rd Year Total

Mean 5.6364 5.5000 7.0000 6.0250 7.5455 8.6875 10.4615 8.9500 10.3636 8.5625 8.7692 9.1250 10.6364 8.4375 8.3846 9.0250 11.0000 8.8750 8.7692 9.4250

N 11 16 13 40 11 16 13 40 11 16 13 40 11 16 13 40 11 16 13 40

The next output shows Boxs test of the assumption of equality of covariance matrices. This statistic tests the null hypothesis that the variancecovariance matrices are the same in all three groups. Therefore, if the matrices are equal (and therefore the assumption of homogeneity is met) this statistic should be non-significant. For these data p = 0.06 (which is greater than 0.05); hence, the covariance matrices are roughly equal and the assumption is tenable.
a Box's Test of Equality of Covariance Matrices

Box's M F df1 df2 Sig.

54.241 1.435 30 3587 .059

Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups. a. Design: Intercept+GROUP

The next table shows the main table of results. For our purposes, the group effects are of interest because they tell us whether or not the scores from different areas of psychology differ across the three years of the degree program. The column of real interest is the one

241

containing the significance values of these F-ratios. For these data, Pillais trace (p =.02), Wilkss lambda (p = 0.012), Hotellings trace (p =.007) and Roys largest root (p =.01) all reach the criterion for significance of 05. From this result we should probably conclude that the profile of knowledge across different areas of psychology does indeed change across the three years of the degree. The nature of this effect is not clear from the multivariate test statistic.
c Multivariate Tests

Effect Intercept

GROUP

Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root

Value .960 .040 24.116 24.116 .510 .522 .853 .773

F 159.166a 159.166a 159.166a 159.166a 2.330 2.534a 2.730 5.255b

Hypothesis df 5.000 5.000 5.000 5.000 10.000 10.000 10.000 5.000

Error df 33.000 33.000 33.000 33.000 68.000 66.000 64.000 34.000

Sig. .000 .000 .000 .000 .020 .012 .007 .001

a. Exact statistic b. The statistic is an upper bound on F that yields a lower bound on the significance level. c. Design: Intercept+GROUP

The next table shows a summary table of Levenes test of equality of variances for each of the dependent variables. These tests are the same as would be found if a one-way ANOVA had been conducted on each dependent variable in turn. Levenes test should be non-significant for all dependent variables if the assumption of homogeneity of variance has been met. The results for these data clearly show that the assumption has been met. This finding not only gives us confidence in the reliability of the univariate tests to follow, but also strengthens the case for assuming that the multivariate test statistics are robust.

242

a Levene's Test of Equality of Error Variances

Experimental Psychology Statistics Social Psychology Personality Developmental

F 1.311 .746 2.852 2.440 2.751

df1 2 2 2 2 2

df2 37 37 37 37 37

Sig. .282 .481 .071 .101 .077

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a. Design: Intercept+GROUP

The next part of the output contains the ANOVA summary table for the dependent variables. The row of interest is that labelled GROUP, which contains an ANOVA summary table for each of the areas of psychology. The values of p indicate that there was a non-significant difference between student groups in terms of all areas of psychology (all ps are greater than 0.05). The multivariate test statistics led us to conclude that the student groups did differ significantly across the types of psychology yet the univariate results contradict this (again ... I really should stop making up data sets that do this!).

243

Tests of Between-Subjects Effects Type III Sum of Squares 18.430a 52.504b 23.584c 39.415d 37.717e 1428.058 3093.775 3330.118 3273.395 3562.212 18.430 52.504 23.584 39.415 37.717 138.545 327.396 224.791 239.560 224.058 1609.000 3584.000 3579.000 3537.000 3815.000 156.975 379.900 248.375 278.975 261.775

Source Corrected Model

Intercept

GROUP

Error

Total

Corrected Total

Dependent Variable Experimental Psychology Statistics Social Psychology Personality Developmental Experimental Psychology Statistics Social Psychology Personality Developmental Experimental Psychology Statistics Social Psychology Personality Developmental Experimental Psychology Statistics Social Psychology Personality Developmental Experimental Psychology Statistics Social Psychology Personality Developmental Experimental Psychology Statistics Social Psychology Personality Developmental

df 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 37 37 37 37 37 40 40 40 40 40 39 39 39 39 39

Mean Square 9.215 26.252 11.792 19.708 18.859 1428.058 3093.775 3330.118 3273.395 3562.212 9.215 26.252 11.792 19.708 18.859 3.744 8.849 6.075 6.475 6.056

F 2.461 2.967 1.941 3.044 3.114 381.378 349.637 548.129 505.575 588.250 2.461 2.967 1.941 3.044 3.114

Sig. .099 .064 .158 .060 .056 .000 .000 .000 .000 .000 .099 .064 .158 .060 .056

a. R Squared = .117 (Adjusted R Squared = .070) b. R Squared = .138 (Adjusted R Squared = .092) c. R Squared = .095 (Adjusted R Squared = .046) d. R Squared = .141 (Adjusted R Squared = .095) e. R Squared = .144 (Adjusted R Squared = .098)

We dont need to look at contrasts because the univariate tests were non-significant, and instead, to see how the dependent variables interact, we need to carry out a DFA.
Wilks' Lambda Test of Function(s) 1 through 2 2 Wilks' Lambda .522 .926 Chi-square 22.748 2.710 df 10 4 Sig. .012 .608

The initial statistics from the DFA tell us that only one of the variates is significant (the second variate is non-significant, p = 0.608). Therefore, the group differences shown by the MANOVA can be explained in terms of one underlying dimension.

244

Standardized Canonical Discriminant Function Coefficients Function 1 Experimental Psychology Statistics Social Psychology Personality Developmental .367 .921 -.353 -.260 -.618 2 .789 -.081 .319 .216 .013

The standardized discriminant function coefficients tell us the relative contribution of each variable to the variate,. Looking at the first variate, its clear that statistics has the greatest contribution to the first variate. Most interesting is that on the first variate, statistics and experimental psychology have positive weights, whereas social, developmental and personality have negative weights. This suggests that the group differences are explained by the difference between experimental psychology and statistics compared to other areas of psychology.
Functions at Group Centroids Function 1 2 -1.246 .186 9.789E-02 -.333 .934 .252

Gorup 1st Year 2nd Year 3rd Year

Unstandardized canonical discriminant functions evaluated at group means

The variate centroids for each group tells us that variate 1 discriminates the first years from second and third years because the first years have a negative value whereas the second and third years have positive values on the first variate. The relationship between the variates and the groups is best illuminated using a combined-groups plot. This graph plots the variate scores for each person, grouped according to the year of their degree. In addition, the group centroids are indicated, which are the average variate scores for each group. The plot for these data confirms that variate

245

1 discriminates the first years from subsequent years (look at the horizontal distance between these centroids).

Canonical Discriminant Functions


3

1 1st Year 0 3rd Year 2nd Year

Gorup
-1 Group Centroids 3rd Year -2 2nd Year -3 -4 -3 -2 -1 0 1 2 3 1st Year

Function 2

Function 1

Overall we could conclude that different years are discriminated by different areas of psychology. In particular, it seems as though statistics and aspects of experimentation (compared to other areas of psychology) discriminate between first-year undergraduates and subsequent years. From the means, we could interpret this as first years struggling with statistics and experimental psychology (compared to other areas of psychology) but their ability improves across the three years. However, for other areas of psychology, first years are relatively good but their abilities decline over the three years. Put another way, psychology degrees improve only your knowledge of statistics and experimentation.
Chapter 17

246

Task 1

The University of Sussex is constantly seeking to employ the best people possible as lecturers (no, really, it is). Anyway, the university wanted to revise a questionnaire based on Blands theory of research methods lecturers. This theory predicts that good research methods lecturers should have four characteristics: (1) a profound love of statistics; (2) an enthusiasm for experimental design; (3) a love of teaching; and (4) a complete absence of normal interpersonal skills. These characteristics should be related (i.e. correlated). The Teaching of Statistics for Scientific Experiments (TOSSE) already existed, but the university revised this questionnaire and it became the Teaching of Statistics for Scientific Experiments Revised (TOSSER). The university gave this questionnaire to 239 research methods lecturers around the world to see if it supported Blands theory. The questionnaire is below and the data are in TOSSE-R.sav. Conduct a factor analysis (with appropriate rotation) to see the factor structure of the data

SD = Strongly Disagree, D = Disagree, N = Neither, A = Agree, SA = Strongly Agree


SD D N A SA

I once woke up in the middle of a vegetable patch hugging a 1 turnip that I'd mistakenly dug up thinking it was Roys largest root 2 If I had a big gun I'd shoot all the students I have to teach

247

3 4 5

I memorize probability values for the F-distribution I worship at the shrine of Pearson I still live with my mother and have little personal hygiene Teaching others makes me want to swallow a large bottle of

bleach because the pain of my burning oesophagus would be light relief in comparison

7 8

Helping others to understand sums of squares is a great feeling I like control conditions I calculate three ANOVAs in my head before getting out of bed

9 every morning 10 I could spend all day explaining statistics to people I like it when people tell me Ive helped them to understand 11 factor rotation 12 13 People fall asleep as soon as I open my mouth to speak Designing experiments is fun Id rather think about appropriate dependent variables than go 14 to the pub I soil my pants with excitement at the mere mention of factor 15 analysis

248

Thinking about whether to use repeated or independent 16 measures thrills me I enjoy sitting in the park contemplating whether to use 17 participant observation in my next experiment Standing in front of 300 people in no way makes me lose 18 control of my bowels 19 I like to help students Passing on knowledge is the greatest gift you can bestow an 20 individual Thinking about Bonferroni corrections gives me a tingly 21 feeling in my groin I quiver with excitement when thinking about designing my 22 next experiment I often spend my spare time talking to the pigeons ... and even 23 they die of boredom I tried to build myself a time machine so that I could go back to 24 the 1930s and follow Fisher around on my hands and knees licking the floor on which hed just trodden 25 26 I love teaching I spend lots of time helping students

249

I love teaching because students have to pretend to like me or 27 theyll get bad marks 28 My cat is my only friend

Multicollinearity: The determinant of the correlation matrix was 0.00000124, which is smaller than 0.00001 and, therefore, indicates that multicollinearity could be a problem in these data (although, strictly speaking, because were using principal component analysis we dont need to worry).

Sample size: McCallum et al. (1999) have demonstrated that when communalities after extraction are above .5, a sample size between 100 and 200 can be adequate and even when communalities are below .5 a sample size of 500 should be sufficient. We have a sample size of 239 with some communalities below .5, and so the sample size may not be adequate. However, the KMO measure of sampling adequacy is .894, which is above Kaisers (1974) recommendation of .5. This value is also meritorious (and almost marvelous) according to Hutcheson and Sofroniou (1999). As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors.

250

Bartletts test: This tests whether the correlations between questions are sufficiently large for factor analysis to be appropriate (it actually tests whether the correlation matrix is sufficiently different from an identity matrix). In this case it is significant (2(378) = 2989.77, p < .001) indicating that the correlations within the R-matrix are sufficiently different from zero to warrant factor analysis.

251

Extraction: SPSS has extracted five factors based on Kaisers criterion of retaining factors with eigenvalues greater than 1. Is this warranted? Kaisers criterion is accurate when there are less than 30 variables and the communalities after extraction are greater than .7, or when the sample size exceeds 250 and the average communality is greater than .6. For these data the sample size is 239, there are 28 variables, and the mean communality is .579, so extracting five factors is not really warranted. The scree plot shows clear inflexions at 3 and 5 factors and so using the scree plot you could justify extracting 3 or 5 factors.

252

Pattern Matrixa Component 3

Thinking about whether to use repeated or independent measures thrills me I'd rather think about appropriate dependent variables than go to the pub I quiver with excitement when thinking about designing my next experiment I enjoy sitting in the park contemplating whether to use participant observation in my next experiment Designing experiments is fun I like control conditions I could spend all day explaining statistics to people I calculate 3 ANOVAs in my head before getting out of bed every morning I like to help students Passing on knowledge is the greatest gift you can bestow an individual I love teaching I love teaching because students have to pretend to like me or they'll get bad marks Helping others to understand Sums of Squares is a great feeling I spend lots of time helping students I like it when people tell me I've helped them to understand factor rotation I often spend my spare time talking to the pigeons ... and even they die of boredom My cat is my only friend I still live with my mother and have little personal hygiene People fall asleep as soon as I open my mouth to speak I tried to build myself a time machine so that I could go back to the 1930s and follow Fisher around on my hands and knees licking the floor on which he'd just trodden I memorize probability values for the F-distribution I worship at the shrine of Pearson I soil my pants with excitement at the mere mention of Factor Analysis Thinking about Bonferroni corrections gives me a tingly feeling in my groin I once woke up in the middle of a vegetable patch hugging a turnip that I'd mistakenly dug up thinking it was Roy's largest root Teaching others makes me want to swallow a large bottle of bleach because the pain of my burning oesophagus would be light relief in comparison If I had a big gun I'd shoot all the students I have to teach Standing in front of 300 people in no way makes me lose control of my bowels Extraction Method: Principal Component Analysis. Rotation Method: Oblimin with Kaiser Normalization. a. Rotation converged in 15 iterations.

1 .829 .813 .765 .733 .575 .556 .458

.735 .651 .633 .556 .470 .461

.713 .702 .692 .493 .752 .662 .589 .481 .479 .445 .818 .784 .549

.433

Rotation: You should choose an oblique rotation because the question says that the constructs were measuring are related. Looking at the pattern matrix (and using loadings greater than .4 as recommended by Stevens) we see the following pattern: Factor 1:

253

1. Q 16. Thinking about whether to use repeated or independent measures thrills me 2. Q 14. Id rather think about appropriate dependent variables than go to the pub 3. Q 22. I quiver with excitement when thinking about designing my next experiment 4. Q 17. I enjoy sitting in the park contemplating whether to use participant observation in my next experiment 5. Q 13. Designing experiments is fun 6. Q 8. I like control conditions 7. Q 10. I could spend all day explaining statistics to people Factor 2: 8. Q 19. I like to help students 9. Q 20. Passing on knowledge is the greatest gift you can bestow an individual 10. Q 25. I love teaching 11. Q 27. I love teaching because students have to pretend to like me or theyll get bad marks 12. Q 7. Helping others to understand sums of squares is a great feeling 13. Q 26. I spend lots of time helping students Factor 3:

254

14. Q 23. I often spend my spare time talking to the pigeons ... and even they die of boredom 15. Q 28. My cat is my only friend 16. Q 5. I still live with my mother and have little personal hygiene 17. Q 12. People fall asleep as soon as I open my mouth to speak Factor 4: 18. Q 24. I tried to build myself a time machine so that I could go back to the 1930s and follow Fisher around on my hands and knees licking the floor on which hed just trodden 19. Q 3. I memorize probability values for the F-distribution 20. Q 4. I worship at the shrine of Pearson 21. Q 15. I soil my pants with excitement at the mere mention of factor analysis 22. Q 21. Thinking about Bonferroni corrections gives me a tingly feeling in my groin 23. Q 1. I once woke up in the middle of a vegetable patch hugging a turnip that Id mistakenly dug up thinking it was Roys largest root Factor 5: 24. Q 6. Teaching others makes me want to swallow a large bottle of bleach because the pain of my burning oesophagus would be light relief in comparison 25. Q 2. If I had a big gun Id shoot all the students I have to teach
255

26. Q 18. Standing in front of 300 people in no way makes me lose control of my bowels No factor: 27. Q 9. I calculate three ANOVAs in my head before getting out of bed every morning 28. Q 11. I like it when people tell me Ive helped them to understand factor rotation

Factor 1 seems to relate to research methods, factor 2 to teaching, factor 3 to general social skills, factor 4 to statistics and factor 5 to, well, err, teaching again. All in all, this isnt particularly satisfying and doesnt really support the four-factor model. We saw earlier that the extraction of five factors probably wasnt justified. In fact the scree plot seems to indicate three. Lets rerun the analysis but asking SPSS for three factors. Lets see how this changes the pattern matrix:

256

Looking at the pattern matrix (and using loadings greater than .4 as recommended by Stevens) we see the following pattern: Factor 1: 29. Q 22. I quiver with excitement when thinking about designing my next experiment 30. Q 8. I like control conditions 31. Q 17. I enjoy sitting in the park contemplating whether to use participant observation in my next experiment 32. Q 21. Thinking about Bonferroni corrections gives me a tingly feeling in my groin

257

33. Q 13. Designing experiments is fun 34. Q 9. I calculate three ANOVAs in my head before getting out of bed every morning 35. Q 3. I memorize probability values for the F-distribution 36. Q 1. I once woke up in the middle of a vegetable patch hugging a turnip that Id mistakenly dug up thinking it was Roys largest root 37. Q 24. I tried to build myself a time machine so that I could go back to the 1930s and follow Fisher around on my hands and knees licking the floor on which he'd just trodden 38. Q 4. I worship at the shrine of Pearson 39. Q 16. Thinking about whether to use repeated or independent measures thrills me 40. Q 7. Helping others to understand sums of squares is a great feeling 41. Q 15. I soil my pants with excitement at the mere mention of factor analysis 42. Q 11. I like it when people tell me Ive helped them to understand factor rotation 43. Q 10. I could spend all day explaining statistics to people 44. Q 14. Id rather think about appropriate dependent variables than go to the pub Factor 2: 45. Q 19. I like to help students 46. Q 2. If I had a big gun Id shoot all the students I have to teach (note negative weight)

258

47. Q 6. Teaching others makes me want to swallow a large bottle of bleach because the pain of my burning oesophagus would be light relief in comparison (note negative weight) 48. Q 18. Standing in front of 300 people in no way makes me lose control of my bowels (note negative weight) 49. Q 26. I spend lots of time helping students 50. Q 25. I love teaching 51. Q 20. Passing on knowledge is the greatest gift you can bestow an individual 52. Q 27. I love teaching because students have to pretend to like me or theyll get bad marks Factor 3: 53. Q 5. I still live with my mother and have little personal hygiene 54. Q 23. I often spend my spare time talking to the pigeons ... and even they die of boredom 55. Q 28. My cat is my only friend 56. Q 12. People fall asleep as soon as I open my mouth to speak 57. Q 27. I love teaching because students have to pretend to like me or theyll get bad marks No factor: This factor is a lot clearer cut: factor 1 relates to a love of methods and statistics, factor 2 to a love of teaching, and factor 3 to an absence of normal social skills. This doesnt
259

support the original four-factor model suggested because the data indicate that love of methods and statistics cant be separated (if you love one you love the other).
Task 2

Task 2: Sian Williams devised a questionnaire to measure organizational ability.

She predicted five factors to do with organisational ability: (1) preference for organization; (2) goal achievement; (3) planning approach; (4) acceptance of delays; and (5) preference for routine. These dimensions are theoretically independent. Williams questionnaire contains 28 items using a 7-point Likert scale (1 = strongly disagree, 4 = neither, 7 = strongly agree). She gave it to 239 people. Run a principal component analysis on the data in Williams.sav. _ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 I like to have a plan to work to in everyday life I feel frustrated when things dont go to plan I get most things done in a day that I want to I stick to a plan once I have made it I enjoy spontaneity and uncertainty I feel frustrated if I cant find something I need I find it difficult to follow a plan through I am an organized person I like to know what I have to do in a day Disorganized people annoy me I leave things to the last minute I have many different plans relating to the same goal I like to have my documents filed and in order I find it easy to work in a disorganized environment I make to do lists and achieve most of the things on it My workspace is messy and disorganized I like to be organized

260

18 19 20 21 22 23 24 25 26 27 28

Interruptions to my daily routine annoy me I feel that I am wasting my time I forget the plans I have made I prioritize the things I have to do I like to work in an organized environment I feel relaxed when I don't have a routine I set deadlines for myself and achieve them I change rather aimlessly from one activity to another during the day I have trouble organizing the things I have to do I put tasks off to another day I feel restricted by schedules and plans

a Correlation Matrix

Correlation i like to have a plan to work to in everyday life a. Determinant = 1.240E-06

KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig. .894 2989.769 378 .000

261

Communalities i like to have a plan to work to in everyday life i feel frustrated when things don't go to plan i get most thigs done in a day that i want to i stick to a plan once i have made it i enjoy spontaneity and uncertainty i feel frustrated if i can't find something i need i find it difficult to follow a plan through i am an organised person i ike to know what i have to do in a day disorganised people annoy me i leace things to the last minute i have many different plans relating to th esame goal i like to have my documents filed and in order i find it easy to work in a disorganised environment i make 'to do' lists and acheive most of the things on it my workspace is messy and disorganised i like to be organised interruptions to my daily routine annoy me i feel that i am wasting my time i forget the plans i have made i prioritise the things i have to do i like to work in an organised environment i feel relaxed when i don't have a routine i set deadlines for myself and acheive them i change rather aimlessly from one activity to another during the day i have trouble organising the things i have to do i put tasks off to another day i feel restristed by schedules and plans Extraction Method: Principal Component Analysis. Initial 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Extraction .646 .624 .591 .589 .545 .621 .486 .683 .638 .417 .539 .297 .531 .709 .511 .681 .705 .514 .536 .477 .566 .766 .587 .649 .550 .599 .619 .538

262

Total Variance Explained Initial Eigenvalues % of Variance Cumulative % 32.373 32.373 9.954 42.328 5.944 48.272 5.409 53.681 4.215 57.896 3.539 61.435 3.304 64.739 2.924 67.663 2.832 70.495 2.657 73.152 2.518 75.670 2.336 78.005 2.224 80.229 2.051 82.281 1.945 84.225 1.841 86.067 1.740 87.806 1.621 89.427 1.511 90.938 1.363 92.301 1.218 93.519 1.193 94.712 1.102 95.814 1.046 96.860 .928 97.788 .887 98.675 .738 99.414 .586 100.000 Extraction Sums of Squared Loadings Total % of Variance Cumulative % 9.064 32.373 32.373 2.787 9.954 42.328 1.664 5.944 48.272 1.515 5.409 53.681 1.180 4.215 57.896 Rotation Sums of Squared Loadings Total % of Variance Cumulative % 4.558 16.279 16.279 3.460 12.356 28.635 3.239 11.568 40.203 2.631 9.397 49.600 2.323 8.296 57.896

Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Total 9.064 2.787 1.664 1.515 1.180 .991 .925 .819 .793 .744 .705 .654 .623 .574 .545 .516 .487 .454 .423 .382 .341 .334 .309 .293 .260 .248 .207 .164

Extraction Method: Principal Component Analysis.

263

a Component Matrix

1 i like to have a plan to work to in everyday life i feel frustrated when things don't go to plan i get most thigs done in a day that i want to i stick to a plan once i have made it i enjoy spontaneity and uncertainty i feel frustrated if i can't find something i need i find it difficult to follow a plan through i am an organised person i ike to know what i have to do in a day disorganised people annoy me i leace things to the last minute i have many different plans relating to th esame goal i like to have my documents filed and in order i find it easy to work in a disorganised environment i make 'to do' lists and acheive most of the things on it my workspace is messy and disorganised i like to be organised interruptions to my daily routine annoy me i feel that i am wasting my time i forget the plans i have made i prioritise the things i have to do i like to work in an organised environment i feel relaxed when i don't have a routine i set deadlines for myself and acheive them i change rather aimlessly from one activity to another during the day i have trouble organising the things i have to do i put tasks off to another day i feel restristed by schedules and plans Extraction Method: Principal Component Analysis. a. 5 components extracted. .684

2 -.543 .584 .600 .446 -.501 .528 .803 .723 .502 .675

Component 3

.452 .524 .453

.519 .673 .614 .559 .650 .768 .421 .456 .674 .791 .432 .614 .501 .533 .580 .458 -.517 -.497 -.523 .620

.518 .444 .502 .520

264

a Rotated Component Matrix

i like to have a plan to work to in everyday life i feel frustrated when things don't go to plan i get most thigs done in a day that i want to i stick to a plan once i have made it i enjoy spontaneity and uncertainty i feel frustrated if i can't find something i need i find it difficult to follow a plan through i am an organised person i ike to know what i have to do in a day disorganised people annoy me i leace things to the last minute i have many different plans relating to th esame goal i like to have my documents filed and in order i find it easy to work in a disorganised environment i make 'to do' lists and acheive most of the things on it my workspace is messy and disorganised i like to be organised interruptions to my daily routine annoy me i feel that i am wasting my time i forget the plans i have made i prioritise the things i have to do i like to work in an organised environment i feel relaxed when i don't have a routine i set deadlines for myself and acheive them i change rather aimlessly from one activity to another during the day i have trouble organising the things i have to do i put tasks off to another day i feel restristed by schedules and plans Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 7 iterations.

1 .409

2 .545 .666 .619

Component 3

4 .765

.666 .781 .535 .587 .432 .440 .470 .435 .506 .593 .764 .447 .775 .714 .447 .450

.509

.586 .712 .649 .505 .748 .523 .672 .744 .407 .688 .568 .613

.411 .673

Component Transformation Matrix Component 1 2 3 4 5 1 .633 -.118 -.188 -.742 .025 2 .520 .050 -.346 .503 -.595 3 .384 .738 .106 .201 .506 4 .302 -.650 -.053 .393 .574 5 .301 -.129 .911 .038 -.246

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.

Extraction: SPSS has extracted five factors based on Kaisers criterion of retaining factors with eigenvalues greater than 1. Is this warranted? Kaisers criterion is accurate when there are less than 30 variables and the communalities after extraction are greater than .7, or when the sample size exceeds 250 and the average communality is greater than .6. For these data the sample size is 239 and the mean communality is .579, so extracting

265

five factors is not really warranted. The scree plot shows clear inflexions at 3 and 5 factors and so using the scree plot you could justify extracting 3 or 5 factors. Looking at the rotated component matrix (and using loadings greater than .4 as recommended by Stevens) we see the following pattern: Factor 1: preference for organization 1. Q8: I am an organized person 2. Q13: I like to have my documents filed and in order 3. Q14: I find it easy to work in a disorganized environment 4. Q 16: My workspace is messy and disorganized 5. Q17: I like to be organized 6. Q22: I like to work in an organized environment Note: Its odd that none of these have reverse loadings. Factor 2: plan approach 7. Q1: I like to have a plan to work to in everyday life 8. Q3: I get most things done in a day that I want to 9. Q4: I stick to a plan once I have made it 10. Q9: I like to know what I have to do in a day 11. Q15: I make to do lists and achieve most of the things on it 12. Q 21: I prioritize the things I have to do 13. Q24: I set deadlines for myself and achieve them
266

Factor 3: goal achievement 14. Q7: I find it difficult to follow a plan through 15. Q11: I leave things to the last minute 16. Q19: I feel that I am wasting my time 17. Q20: I forget the plans I have made 18. Q25: I change rather aimlessly from one activity to another during the day 19. Q26: I have trouble organizing the things I have to do 20. Q27: I put tasks off to another day Factor 4: acceptance of delays 21. Q2: I feel frustrated when things dont go to plan 22. Q6: I feel frustrated if I cant find something I need 23. Q10: Disorganized people annoy me 24. Q18: Interruptions to my daily routine annoy me Factor 5: preference for routine 25. Q5: I enjoy spontaneity and uncertainty 26. Q12: I have many different plans relating to the same goal 27. Q23: I feel relaxed when I don't have a routine 28. Q28: I feel restricted by schedules and plans Therefore, it seems as though there is some factorial validity to the structure.

267

Chapter 18

Task 1 Certain editors at Sage Publications like to think there a bit of a whiz at football (soccer if you prefer). To see whether they are better than Sussex lecturers and postgraduates we invited various employees of Sage to join in our football matches (oh, sorry, I mean we invited down for important meetings about books). Every player was only allowed to play in one match. Over many matches, we counted the number of players that scored goals. The data are in the file
SageEditorsCantPlayFootball.sav. do a chi-square test to see whether more

publishers or academics scored goals. We predict that Sussex people will score more than Sage people. Lets run the analysis on the first question. First we must remember to tell SPSS which variable contains the frequencies by using the weight cases command. Select __, then in the resulting dialog box select _ and then select the variable in which the number of cases is specified (in this case Frequency) and drag it to the box labelled Frequency variable (or click on _). This process tells the computer that it should weight each category combination by the number in the column labelled Frequency.

268

To run the chi-square tests, select ___. First, select one of the variables of interest in the variable list and drag it into the box labelled Row(s) (or click on _). For this example, I selected Job to be the rows of the table. Next, select the other variable of interest (Score) and drag it to the box labelled Column(s) (or click on _). Select the same options as in the book.

269

The crosstabulation table produced by SPSS contains the number of cases that falls into each combination of categories. We can see that in total 28 people scored goals (36.47% of the total) and of these 5 were from Sage Publications (17.9% of the total that scored) and only 23 were from Sussex (82.1% of the total that scored); 49 people didnt score at all (63.6% of the total) and, of those, 19 worked for Sage (38.8% of the total that didnt score) and 30 were from Sussex (61.2% of the total that didnt score).
Job * Did they score a goal? Crosstabulation Did they score a goal? Yes No 5 19 8.7 15.3 20.8% 79.2% 17.9% 6.5% 23 19.3 43.4% 82.1% 29.9% 28 28.0 36.4% 100.0% 36.4% 38.8% 24.7% 30 33.7 56.6% 61.2% 39.0% 49 49.0 63.6% 100.0% 63.6%

Total 24 24.0 100.0% 31.2% 31.2% 53 53.0 100.0% 68.8% 68.8% 77 77.0 100.0% 100.0% 100.0%

Job

Sage Publications

University of Sussex

Total

Count Expected Count % within Job % within Did they score a goal? % of Total Count Expected Count % within Job % within Did they score a goal? % of Total Count Expected Count % within Job % within Did they score a goal? % of Total

Before moving on to look at the test statistics itself it is vital that we check that the assumption for chi-square has been met. The assumption is that in 2 2 tables (which is what we have here), all expected frequencies should be greater than 5. If you look at the expected counts in the crosstabulation table, it should be clear that the smallest expected count is 8.7 (for Sage editors who scored). This value exceeds 5 and so the assumption has been met. Pearsons chi-square test examines whether there is an association between two categorical variables (in this case the job and whether the person scored or not). As part of the crosstabs procedure SPSS produces a table that includes the chi-square statistic and
270

its significance value. The Pearson chi-square statistic tests whether the two variables are independent. If the significance value is small enough (conventionally Sig. must be less than 0.05) then we reject the hypothesis that the variables are independent and accept the hypothesis that they are in some way related. The value of the chi-square statistic is given in the table (and the degrees of freedom) as is the significance value. The value of the chi-square statistic is 3.63. This value has a two-tailed significance of 0.057, which is bigger than 0.05 (hence non-significant). However, we made a specific prediction (that Sussex people would score more than Sage people), hence we can halve this value. Therefore, the chi-square is significant (one-tailed) because p = 0.0285, which is less than 0.05. The one-tailed significance values of the other statistics are also less than 0.05 so we have consistent results.
Chi-Square Tests Value 3.634b 2.725 3.834 df 1 1 1 Asymp. Sig. (2-sided) .057 .099 .050 Exact Sig. (2-sided) Exact Sig. (1-sided)

Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

.075 3.587 77 1 .058

.047

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 8.73.

The highly significant result indicates that there is an association between the type of job someone does and whether they score goals. This significant finding reflects the fact that for Sussex employees there is about a 50% split of those that scored and those that didnt, but for Sage employees there is about a 2080 split with only 20% scoring and 80% not scoring. This supports our hypothesis that people from Sage, despite their delusions, are crap at football!

271

Calculating an Effect Size The odds of someone scoring given that they were employed by Sage is 5/19 = 0.26, and the odds of someone scoring given that they were employed by Sussex University is 23/30 = 0.77. Therefore, the odds ratio is 0.26/0.77 = 0.34. In other words, the odds of scoring if you work for Sage are 0.34 times higher than if you work for Sussex; a better way to express this is that if you work for Sage, the odds of scoring are 1/0.34 = 2.95 lower than if you work for Sussex! Reporting the Results of Chi-Square We could report: There was a significant association between the type of job and whether or not a person scored a goal, 2(1) = 3.63, p < .05 (one-tailed). This represents the fact that, based on the odds ratio, Sage employees were 2.95 times less likely to score than Sussex employees.
Task 2

I wrote much of this update while on sabbatical in the Netherlands (I have a real soft spot for Holland). However, living there for three months did enable me to notice certain cultural differences to England. The Dutch are famous for travelling by bike; they do it much more than the English. However, I noticed that many more Dutch people cycle while steering with only one hand. I pointed this out to one of my friends, Birgit Mayer, and she said that I was being a crazy English fool and that Dutch people did not cycle one-handed. Several weeks of my pointing at one-handed cyclists and her pointing at two-handed cyclists ensued.

272

To put it to the test I counted the number of Dutch and English cyclists who ride with one or two hands on the handlebars (Handlebars.sav). Can you work out whether Birgit or I am right? First, we must remember to tell SPSS which variable contains the frequencies by using the weight cases command. Select __, then in the resulting dialog box select _ and then select the variable in which the number of cases is specified (in this case
Frequency) and drag it to the box labelled Frequency variable (or click on _). This

process tells the computer that it should weight each category combination by the number in the column labelled Frequency.

To run the chi-square tests, select ___. First, select one of the variables of interest in the variable list and drag it into the box labelled Row(s) (or click on _). For this example, I selected Nationality to be the rows of the table. Next, select the other variable of interest (Hands) and drag it to the box labelled Column(s) (or click on _). Select the same options as in the book.

273

The crosstabulation table produced by SPSS contains the number of cases that falls into each combination of categories. We can see that in total 137 people rode their bike onehanded, of which 120 (87.6%) were Dutch and only 17 (12.4%) were English; 732 people rode their bike two-handed, of which 578 (79%) were Dutch and only 154 (21%) were English.

274

Before moving on to look at the test statistics itself it is vital that we check that the assumption for chi-square has been met. The assumption is that in 2 2 tables (which is what we have here), all expected frequencies should be greater than 5. If you look at the expected counts in the crosstabulation table, it should be clear that the smallest expected count is 27 (for English people who ride their bike one-handed). This value exceeds 5 and so the assumption has been met. The value of the chi-square statistic is 5.44. This value has a two-tailed significance of 0.020, which is smaller than 0.05 (hence significant). This suggests that the pattern of bike riding (i.e. relative numbers of one- and two-handed riders) significantly differs in English and Dutch people.

_ The significant result indicates that there is an association between whether someone is Dutch or English and whether they ride their bike one- or two-handed. Looking at the frequencies, this finding seems to show that the ratio of one- to two-handed riders differs in Dutch and English people. In Dutch people 17.2% ride their bike one-handed compared to 82.8% who ride two-handed. In England, though, only 9.9% rode their bike one-handed (almost half as many as in Holland), and 90.1% rode their bikes two-handed. If we look at the standardized residuals (in the contingency table) we can see that the only cell with a residual approaching significance (a value that lies outside of 1.96) is the cell
275

for English people riding one-handed (z = 1.9). The fact that this value is negative tells us that fewer people than expected fell into this cell. Calculating an Effect Size The odds of someone riding one-handed if they are Dutch is 120/578 = 0.21, and the odds of someone riding one-handed if they are English is 17/154 = 0.11. Therefore, the odds ratio is 0.21/0.11 = 1.9. In other words, the odds of riding one-handed if you are Dutch is 1.9 times higher than if you are English (or the odds of riding one-handed if you are English are about half that of a Dutch person). Reporting the Results of Chi-Square We could report: There was a significant association between nationality and whether the Dutch or English rode their bike one- or two-handed, 2 (1) = 5.44, p < .05. This represents the fact that, based on the odds ratio, the odds of riding a bike one-handed were 1.9 time higher for Dutch people than English people. This supports Fields argument that there are more one-handed bike riders in the Netherlands than in England and utterly refutes Mayers theory that Field is a complete arse. These data are in no way made up.
Task 3

I was interested in whether horoscopes are just a figment of peoples minds. Therefore, I got 2201 people, made a note of their star sign (this variable, obviously, has 12 categories: Capricorn, Aquarius, Pisces, Aries, Taurus, Gemini, Cancer, Leo, Virgo, Libra, Scorpio and Sagittarius) and whether they believed in
276

horoscopes (this variable has two categories: believer or unbeliever). I then sent them a horoscope in the post of what would happen over the next month: everybody, regardless of their star sign, received the same horoscope which read:August is an exciting month for you. You will make friends with a tramp in the first week of the month and cook him a cheese omelette. Curiosity is your greatest virtue, and in the second week youll discover knowledge of a subject that you previously thought was boring statistics perhaps. You might purchase a book around this time that guides you towards this knowledge. Your new wisdom leads to a change in career around the third week, when you ditch your current job and become an accountant. By the final week you find yourself free from the constraints of having friends, your boy/girlfriend has left you for a Russian ballet dancer with a glass eye, and you now spend your weekends doing loglinear analysis by hand with a pigeon called Hephzibah for company. At the end of August I interviewed all of these people and I classified the horoscope as having come true, or not, based on how closely their lives matched the fictitious horoscope. The data are in the file Horoscope.sav. Conduct a loglinear analysis to see whether there is a relationship between the persons star sign, whether they believe in horoscopes and whether the horoscope came true.

Running the Analysis Data are entered for this example as frequency values for each combination of categories so before you begin you must weight the cases by the variable frequency. If you dont do this the entire output will be wrong! Select __, then in the resulting

277

dialog box select _ and then select the variable in which the number of cases is specified (in this case Frequency) and drag it to the box labelled Frequency variable (or click on _). This process tells the computer that it should weight each category combination by the number in the column labelled Frequency.

To get a crosstabulation table, select ___. We have three variables in our crosstabulation table: whether someone believes in star signs or not (Believe), the star sign of the person (Star_Sign) and whether the horoscope came true or not (True). Select Believe and drag it into the box labelled Row(s) (or click on _). Next, select True and drag it to the box labelled Column(s) (or click on _). We have a third variable too, and we need to define this variable as a layer. Select Star_Sign and drag it to the box labelled Layer 1 of 1 (or click on _). Then click on _ and select the options required.

278

The crosstabulation table produced by SPSS contains the number of cases that falls into each combination of categories. Although this table is quite complicated you should be able to see that there are roughly the same number of believers and non-believers and similar numbers of those whose horoscopes came true or didnt. These proportions are fairly consistent also across the different star signs! Also there are no expected counts less than 5, so our assumptions are met.

279

_
280

The Loglinear Analysis Then run the main analysis, The way to run loglinear analysis that is consistent with my section on the theory of the analysis is to select ___ to access the dialog box. Select any variable that you want to include in the analysis by selecting them with the mouse (remember that you can select several at the same time by holding down the Ctrl key) and then dragging them to the box labelled Factor(s) (or click on _). When there is a variable in this box the _ button becomes active. We have to tell SPSS the codes that weve used to define our categorical variables. Select a variable in the Factor(s) box and then click on _ to activate a dialog box that allows you to specify the value of the minimum and maximum code that youve used for that variable. When youve done this click on _ to return to main dialog box.

Output from Loglinear Analysis The initial output from the loglinear analysis tells us that we have 2201 cases. SPSS then lists all of the factors in the model and the number of levels they have. To begin with,

281

SPSS fits the saturated model (all terms are in the model including the highest-order interaction, in this case the star sign believer true interaction). SPSS then gives us the observed and expected counts for each of the combinations of categories in our model. These values should be the same as the original contingency table except that each cell has 0.5 added to it. The final bit of this initial output gives us two goodness-of-fit statistics (Pearsons chi-square and the likelihood-ratio statistic, both of which we came across at the beginning of this chapter). In this context these tests are testing the hypothesis that the frequencies predicted by the model (the expected frequencies) are significantly different from the actual frequencies in our data (the observed frequencies). At this stage the model perfectly fits the data so both statistics are 0 and yield a probability value, p, of ..

282

283

The next part of the output tells us something about which components of the model can be removed. The first bit of the output is labelled K-way and higher-order effects and underneath there is a table showing likelihood-ratio and chi-square statistics when K = 1, 2 and 3 (as we go down the rows of the table). The first row (K = 1) tells us whether removing the one-way effects (i.e. the main effects of star sign, believer and true) and any higher-order effects will significantly affect the fit of the model. There are lots of higher-order effects herethere are the two-way interactions and the three-way interactionand so this is basically testing whether if we remove everything from the model there will be a significant effect on the fit of the model. This is highly significant because the probability value is 0.000, which is less than 0.05. The next row of the table (K = 2) tells us whether removing the two-way interactions (i.e. the star sign believer, star sign true and believer true interactions) and any higher-order effects will affect the model. In this case there is a higher-order effect (the three-way interaction) so this is testing whether removing the two-way interactions and the three-way interaction would affect the fit of the model. This is significant (the probability is 0.03, which is less than 0.05) indicating that if we removed the two-way interactions and the three-way interaction then this would have a significant detrimental effect on the model. The final row (K = 3) is testing whether removing the three-way effect and higher-order effects will significantly affect the fit of the model. Now of course, the three-way interaction is the highest-order effect that we have so this is simply testing whether removal of three-way interaction (i.e. the star sign believer
284

true interaction) will significantly affect the fit of the model. If you look at the two columns labelled Prob then you can see that both chi-square and likelihood ratio tests agree that removing this interaction will not significantly affect the fit of the model (because the probability value is greater than 0.05).

The next part of the table expresses the same thing but without including the higher-order effects. Its labelled K-way effects and then lists tests for when K = 1, 2 and 3. The first row (K = 1), therefore, tests whether removing the main effects (the one-way effects) has a significant detrimental effect on the model. The probability values are less than 0.05, indicating that if we removed the main effects of star sign, believer and true from our model it would significantly affect the fit of the model (in other words, one or more of these effects are significant predictors of the data). The second row (K = 2) tests whether removing the two-way interactions has a significant detrimental effect on the model. The probability values are less than 0.05, indicating that if we removed the star sign believer, star sign true and believer true interactions then this would significantly reduce how well the model fits the data. In other words, one or more of these two-way interactions is a significant predictor of the data. The final row (K = 3) tests whether removing the three-way interaction has a significant detrimental effect on the model. The
285

probability values are greater than 0.05, indicating that if we removed the star sign believer true interaction then this would not significantly reduce how well the model fits the data. In other words, this three-way interaction is not a significant predictor of the data. This row should be identical to the final row of the upper part of the table (the Kway and higher-order effects) because it is the highest-order effect and so in the

previous table there were no higher-order effects to include in the test (look at the output and youll see the results are identical). What this is actually telling us is that the three-way interaction is not significant: removing it from the model does not have a significant effect on how well the model fits the data. We also know that removing all two-way interactions does have a significant effect on the model, as does removing the main effects, but you have to remember that loglinear analysis should be done hierarchically and so these two-way interactions are more important than the main effects. The Partial Association table simply breaks down the table that weve just looked at into its component parts. So, for example, although we know from the previous output that removing all of the two-way interactions significantly affects the model, we dont know which of the two-way interactions is having the effect. This table tells us. We get a Pearson chi-square test for each of the two-way interactions and the main effects and the column labelled Sig. tells us which of these effects is significant (values less than .05 are significant). We can tell from this that the star sign believe and believe true interactions are significant but the star sign true interaction is not. Likewise, we saw in the previous output that removing the one-way effects also significantly affects the fit of the model, and these findings are confirmed here because the main effect of star sign is
286

highly significant (although this just means that we collected different amounts of data for each of the star signs!).

The final bit of output deals with the backward elimination. SPSS will begin with the highest-order effect (in this case, the star sign believe true interaction), it removes it from the model, sees what effect this has, and if it doesnt have a significant effect then it moves on to the next highest effects (in this case the two-way interactions). As weve already seen, removing the three-way interaction does not have a significant effect and this is confirmed at this stage by the table labelled Step Summary, which confirms that removing the three-way interaction has a non-significant effect on the model. At step 1, the three two-way interactions are then assessed in the bit of the table labelled Deleted
Effect. From the values of Sig. its clear that the star sign believe (p = .037) and believe

true (p = .000) interactions are significant but the star sign true interaction (p = 0. 465) is not. Therefore, at step 2 the non-significant star sign true interaction is deleted leaving the remaining two-way interactions in the model. These two interactions are then re-evaluated and both the star sign believe (p = .049) and believe true (p = .001) interactions are still significant and so are still retained. Therefore, the final model is the one that retains all main effects and these two interactions. As neither of these interactions can be removed without affecting the model, and these interactions involve
287

all three of the main effects (the variables star sign, true and believe are all involved in at least one of the remaining interactions), the main effects are not examined (because their effect is confounded with the interactions that have been retained). Finally, SPSS evaluates this final model with the likelihood ratio statistic and were looking for a nonsignificant test statistic which indicates that the expected values generated by the model are not significantly different from the observed data (put another way, the model is a good fit of the data). In this case the result is very non-significant indicating that the model is a good fit of the data.

The believe true Interaction The next step is to try to interpret these interactions. The first useful thing we can do is to collapse the data. Remember from the chapter that there are the following rules for collapsing data: (1) the highest-order interaction should be non-significant; and (2) at

288

least one of the lower-order interaction terms involving the variable to be deleted should be non-significant. We need to look at star sign believe and believe true interaction. Lets take the believe true interaction first. Ideally we want to collapse the data across the star sign variable. To do this the three-way interaction must be non-significant (it was) and at least one lower-order interaction involving star sign must be also (the star sign true interaction was). So, we can look at this interaction by doing a chi-square on believe and true, ignoring star sign. The results are below:
Did Their Horoscope Come True? * Do They Believe? Crosstabulation Do They Believe? Unbeliever Believer 582 532 542.1 571.9 26.4% 24.2% 489 598 528.9 558.1 22.2% 27.2% 1071 1130 1071.0 1130.0 48.7% 51.3%

Did Their Horoscope Come True?

Horoscope Didn't Come True Horoscope Came True

Total

Count Expected Count % of Total Count Expected Count % of Total Count Expected Count % of Total

Total 1114 1114.0 50.6% 1087 1087.0 49.4% 2201 2201.0 100.0%

Chi-Square Tests Value 11.601b 11.312 11.612 df 1 1 1 Asymp. Sig. (2-sided) .001 .001 .001 Exact Sig. (2-sided) Exact Sig. (1-sided)

Pearson Chi-Square Continuity Correctiona Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

.001 11.596 2201 1 .001

.000

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 528.93.

This chi-square is highly significant. To interpret this we could consider calculating some odds ratios. First, the odds of the horoscope coming true given that the person was a believer was 598/532 = 1.12. However, the odds of the horoscope coming true given that the person was an unbeliever was 489/582 = 0.84. Therefore, the odds ratio is 1.12/0.84 = 1.33. We can interpret this by saying that the odds that a horoscope would come true

289

were 1.33 times higher in believers than non-believers. Given that the horoscopes were made-up twaddle this might be evidence that believers behave in ways to make their horoscopes come true! The star sign believe interaction Next, we can look at the star sign believe interaction. For this interaction wed like to collapse across the true variable, To do this: (1) the highest-order interaction should be non-significant (which it is); and (2) at least one of the lower-order interaction terms involving the variable to be deleted should be non-significant (the star sign true interaction was). So, we can look at this interaction by doing a chi-square on star sign and believe, ignoring true. The results are below:

290

Star Sign * Do They Believe? Crosstabulation Do They Believe? Unbeliever Believer 102 110 103.2 108.8 48.1% 51.9% 46 51 47.2 49.8 47.4% 52.6% 106 134 116.8 123.2 44.2% 55.8% 78 124 98.3 103.7 38.6% 61.4% 98 91 92.0 97.0 51.9% 48.1% 118 88 100.2 105.8 57.3% 42.7% 160 179 165.0 174.0 47.2% 52.8% 37 32 33.6 35.4 53.6% 46.4% 124 115 116.3 122.7 51.9% 48.1% 53 58 54.0 57.0 47.7% 52.3% 52 56 52.6 55.4 48.1% 51.9% 97 92 92.0 97.0 51.3% 48.7% 1071 1130 1071.0 1130.0 48.7% 51.3%

Star Sign

Capricorn

Aquarius

Pisces

Aries

Taurus

Gemini

Cancer

Leo

Virgo

Libra

Scorpio

Sagittarius

Total

Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign Count Expected Count % within Star Sign

Total 212 212.0 100.0% 97 97.0 100.0% 240 240.0 100.0% 202 202.0 100.0% 189 189.0 100.0% 206 206.0 100.0% 339 339.0 100.0% 69 69.0 100.0% 239 239.0 100.0% 111 111.0 100.0% 108 108.0 100.0% 189 189.0 100.0% 2201 2201.0 100.0%

Chi-Square Tests Value 19.634a 19.737 2.651 2201 df 11 11 1 Asymp. Sig. (2-sided) .051 .049 .103

Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association N of Valid Cases

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 33.58.

This chi-square is borderline significant (two-tailed, but then again we had no prediction so we need to look at the two-tailed significance). It doesnt make a lot of sense to compute odds ratios because there are so many star signs (although we could use one star

291

sign as a base category and compute odds ratios for all other signs compared to this category). However, the obvious general interpretation of this effect is that the ratio of believers to unbelievers in certain star signs is different. For example, in most star signs there is a roughly 50:50 split of believers and unbelievers, but for Aries there is a 40:60 split and it is probably this difference that is most contributing to the effect. However, its important to keep this effect in perspective. It may not be that interesting that we happened to sample a different ratio of believers and unbelievers in certain star signs (unless you believe that certain star signs should have more cynical views of horoscopes than others!). We actually set out to find out something about whether the horoscopes would come true and its worth remembering that this interaction ignores the crucial variable that measured whether or not the horoscope came true! Reporting the Results For this example we could report: The three-way loglinear analysis produced a final model that retained the star sign believe and believe true interactions. The likelihood ratio of this model was

2(22) = 19.58, p = 0.61. The star sign believe interaction was significant, 2
(11) = 19.74, p < 0.05. This interaction indicates that the ratio of believers and unbelievers was different across the 12 star signs. In particular the ratio in Aries (38.6:62.4 ratio of unbelievers to believers) was quite different to the other groups, which consistently had a roughly 50:50 split. The believe true interaction was also significant, 2 (1) = 11.61, p < .001. The odds ratio indicated that the odds of the horoscope coming true were 1.33 times more likely in believers than non-believers. Given that the horoscopes were made-up twaddle
292

this might be evidence that believers behave in ways to make their horoscopes come true. Task 4 On my statistics course students have weekly SPSS classes in a computer laboratory. These classes are run by postgraduate tutors but I often pop in to help out. Ive noticed when in these sessions that many students are studying Facebook rather more than they are studying their very interesting statistics assignments that I have set them. I wanted to see the impact that this behaviour had on their exam performance. I collected data from all 260 students on my course. First I checked their Attendance and classified them as having attended either more or less than 50% of their lab classes. Next, I classified them as being either someone who looked at Facebook during their lab class, or someone who never did. Lastly, after the Research Methods in Psychology (RMiP) exam, I classified them as having either passed or failed (Exam). The data are in Facebook.sav. Do a loglinear analysis on the data to see if there is an association between studying Facebook and failing your exam. Running the Analysis Data are entered for this example as frequency values for each combination of categories so before you begin you must weight the cases by the variable frequency. If you dont do this the entire output will be wrong! Select __, then in the resulting dialog box select _ and then select the variable in which the number of cases is specified (in this case Frequency) and drag it to the box labelled Frequency variable

293

(or click on _). This process tells the computer that it should weight each category combination by the number in the column labelled Frequency.

To get a crosstabulation table, select ___. We have three variables in our crosstabulation table: whether someone looked at Facebook during their lab classes (Facebook), whether they attended more than 50% of classes (Attendance) and whether they passed or failed their RMiP exam (Exam). Select Facebook and drag it into the box labelled Row(s) (or click on _). Next, select Exam and drag it to the box labelled Column(s) (or click on _). We have a third variable too, and we need to define this variable as a layer. Select Attendance and drag it to the box labelled Layer 1 of 1 (or click on _). Then click on _ and select the options required. _ The crosstabulation table produced by SPSS contains the number of cases that falls into each combination of categories. There are no expected counts less than 5, so our assumptions are met.

294

_ The Loglinear Analysis Then run the main analysis, The way to run loglinear analysis that is consistent with my section on the theory of the analysis is to select ___ to access the dialog box. Select any variable that you want to include in the analysis by selecting them with the mouse (remember that you can select several at the same time by holding down the Ctrl key) and then dragging them to the box labelled Factor(s) (or click on _). When there is a variable in this box the _ button becomes active. We have to tell SPSS the codes that weve used to define our categorical variables. Select a variable in the Factor(s) box and then click on _ to activate a dialog box that allows you to specify

295

the value of the minimum and maximum code that youve used for that variable. When youve done this click on _ to return to main dialog box.

Output from Loglinear Analysis

The first bit of the output labelled K-way and higher-order effects shows likelihood ratio and chi-square statistics when K = 1, 2 and 3 (as we go down the rows of the table). The first row (K = 1) tells us whether removing the one-way effects (i.e. the main effects of attendance, Facebook and exam) and any higher-order effects will significantly affect the fit of the model. There are lots of higher-order effects herethere are the two way interactions and the three-way interactionand so this is basically testing whether if we

296

remove everything from the model there will be a significant effect on the fit of the model. This is highly significant because the probability value is 0.000, which is less than 0.05. The next row of the table (K = 2) tells us whether removing the two-way interactions (i.e. the Attendance Exam, Facebook Exam and Attendance Facebook) and any higher-order effects will affect the model. In this case there is a higher-order effect (the three-way interaction) so this is testing whether removing the two-way interactions and the three-way interaction would affect the fit of the model. This is significant (the probability is 0.000, which is less than 0.05) indicating that if we removed the two-way interactions and the three-way interaction then this would have a significant detrimental effect on the model. The final row (K = 3) is testing whether removing the three-way effect and higher-order effects will significantly affect the fit of the model. Now, of course, the three-way interaction is the highest-order effect that we have, so this is simply testing whether removal of the three-way interaction (i.e. the Attendance Facebook Exam interaction) will significantly affect the fit of the model. If you look at the two columns labelled Prob then you can see that both chi-square and likelihood ratio tests agree that removing this interaction will not significantly affect the fit of the model (because the probability value is greater than 0.05. The next part of the table expresses the same thing but without including the higher-order effects. Its labelled K-way effects and then lists tests for when K = 1, 2 and 3. The first row (K = 1), therefore, tests whether removing the main effects (the one-way effects) has a significant detrimental effect on the model. The probability values are less than 0.05 indicating that if we removed the main effects of star sign, believer and true from our model it would significantly affect the fit of the model (in other words, one or more of

297

these effects are significant predictors of the data). The second row (K = 2) tests whether removing the two-way interactions has a significant detrimental effect on the model. The probability values are less than 0.05, indicating that if we removed the two-way interactions then this would significantly reduce how well the model fits the data. In other words, one or more of these two-way interactions is a significant predictor of the data. The final row (K = 3) tests whether removing the three-way interaction has a significant detrimental effect on the model. The probability values are greater than 0.05 indicating that if we removed the three-way interaction then this would not significantly reduce how well the model fits the data. In other words, this three-way interaction is not a significant predictor of the data. This row should be identical to the final row of the upper part of the table (the K-way and higher-order effects) because it is the highest-order effect and so in the previous table there were no higher order effects to include in the test (look at the output and youll see the results are identical).

298

The main effect of Attendance was significant, 2(1) = 27.63, p < .001, indicating (based on the contingency table) that significantly more students attended over 50% of their classes (N = 172)5 than those that attended less than 50% (N = 88)6. The main effect of Facebook was significant, 2 (1) = 10.47, p < .01, indicating (based on the contingency table) that significantly less students looked at Facebook during their classes (N = 104)7 than those that did not look at Facebook (N = 156)8. The main effect of Exam was significant, 2 (1) = 22.54, p < .001, indicating (based on the contingency table) that significantly more students passed the RMiP exam (N = 168)9 than failed (N = 92)10. The Attendance Exam interaction was significant, 2(1) = 61.80, p < .01, indicating that whether you attended more or less than 50% of classes affected exam performance. To illustrate heres the contingency table:

39+30+98+5 = 172 5+30+26+27 = 88 7 39+30+5+30 = 104 8 98+5+26+27 = 156 9 39+98+5+26 = 168 10 30+5+30+27 = 92
5 6

299

_ This shows that those who attended more than half of their classes had a much better chance of passing their exam (nearly 80% passed) than those attending less than 50% of classes (only 35% passed). All of the standardized residuals are significant, indicating that all cells contribute to this overall association. The Facebook Exam interaction was significant, 2(1) = 49.77, p < .001, indicating that whether you looked at Facebook or not affected exam performance. To illustrate heres the contingency table:

_ This shows that those who looked at Facebook had a much lower chance of passing their exam (58% failed) than those who didnt look at Facebook during their lab classes (around 80% passed).
300

The Facebook Attendance Exam interaction was not significant, 2(1) = 1.57, p = .20. This result indicates that the effect of Facebook (described above) was the same (roughly) in those who attended more than 50% of classes and those that attended less than 50% of classes. In other words, although those attending less than 50% of classes did worse than those attending, within that group, those looking at Facebook did relatively worse than those not looking at Facebook.
Chapter 19 Task 1

Using the cosmetic surgery example, run the analysis but also including BDI, age and gender as fixed effect predictors. What differences does including these predictors make?

Select ___, and specify the contextual variable by selecting Clinic from the list of variables and dragging it to the box labelled Subjects (or click on _).

301

Click on _ to move to the main dialog box. First we must specify our outcome variable, which is quality of life (QoL) after surgery, so select Post_QoL and drag it to the space labelled Dependent variable (or click on _). Next we need to specify our predictors. Therefore, select Surgery, Base_QoL, Age, Gender and BDI (hold down Ctrl and you can select both of them simultaneously) and drag them to the space labelled Covariate(s) (or click on _)

302

We need to add the predictors as fixed effect to our model, so click on _, hold down Ctrl and select Base_QoL, Surgery, Age, Gender and BDI in the list labelled Factors and Covariates. Then make sure that _ is set to _ and click on _ to transfer these predictors to the Model. To specify the interaction term, first click on _ and change it to _. Next, select Surgery from the Factors and Covariates and then while holding down the Ctrl key select Reason. With both variables selected click on _ to transfer them to the Model as an interaction effect. Click on _ to return to the main dialog box.

We now need to ask for a random intercept, and random slopes for the effect of
Surgery. Click on _ in the main dialog box. Select Clinic and drag it to the area

labelled Combinations (or click on _). We want to specify that the intercept is random, and we do this by selecting _. Next, select Surgery from the list of Factors and covariates and add it to the model by clicking on _. The other change that we need to make is that we need to estimate the covariance between the random slope

303

and random intercept. This estimation is achieved by clicking on _ to access the drop down list, and selecting _.

Click on _ and select _. Click on _ to return to the main dialog box. In the main dialog box click on _ and request Parameter estimates and Tests for covariance parameter. Click on _ to return to the main dialog box. To run the analysis, click on _. The output is as follows:

304

In terms of the overall fit of this new model, we can use the log-likelihood statistics:

305

If we look at the critical values for the chi-square statistic in the Appendix, it is 7.81 (p < .05, df = 3); therefore, this change is significant. Including these three predictors has improved the fit of the model. Age, F(1, 150.83) = 37.32, p < .001, and BDI, F(1, 260.83) = 16.74, p < .001, significantly predicted quality of life after surgery but gender did not, F(1, 264.48) = 0.90, p = .34. The main difference that including these factors has made is that the main effect of Reason has become non-significant, and the Reason Surgery interaction has become more significant (its b has changed from 4.22, p = .013, to 5.02, p = .001). We could break down this interaction as we did in the chapter by splitting the file and running a simpler analysis (without the interaction and the main effect of Reason, but including Base_QoL, Surgery, BDI, Age and Gender). If you do these analyses you will get the parameter tables below. These tables show a similar pattern to the example in the book. It shows that for those operated on only to change their appearance surgery significantly predicted quality of life after surgery, b = 3.16, t(5.25) = 2.63, p = .04. Unlike when age, gender and BDI were not included, this effect is now significant. The negative gradient shows that in these people quality of life was lower after surgery compared to the control group. However, for those that had surgery to solve a physical problem surgery did not significantly predict quality of life, b = 0.67, t(10.59) = 0.58, p = .57. In essence the inclusion of age, gender and BDI has made very little difference in

306

this latter group. However, the slope was positive, indicating that people who had surgery scored higher on quality of life than those on the waiting list (although not significantly so!). The interaction effect, therefore, as in the chapter reflects the difference in slopes for surgery as a predictor of quality of life in those that had surgery for physical problems (slight positive slope) and those that had surgery purely for vanity (a negative slope).
Surgery to Change Appearance:

Surgery for a Physical Problem:

Task 2 Using our growth model example in this chapter, analyse the data but include
Gender as an additional covariate. Does this change your conclusions?.

First, select ___ and in the initial dialog box set up the level 2 variable. In this example, life satisfaction at multiple time points is nested within people. Therefore, the level 2 variable is the person and this variable is represented by the variable labelled Person.
307

Select this variable and drag it to the box labelled Subjects (or click on _). Click on _ to access the main dialog box.

In the main dialog box we need to set up our predictors and outcome. The outcome was life satisfaction, so select Life_Satisfaction and drag it to the box labelled Dependent variable (or click on _). Our predictor, or growth variable, is Time so select this variable and drag it to the box labelled Covariate(s), or click on _. We also want to include
Gender, so select this variable and drag it to the box labelled Covariate(s), or click on _.

308

Click on _ to bring up the fixed effects dialog box. First we need to include Gender in the model so select this variable and click on _ to add it into the model. To specify the linear polynomial, click on Time and then click on _ to add it into the model. To add the higher-order polynomials we need to select _. Select Time in the Factors and Covariates list and _ will become active; click on this button and Time will appear in the space labelled Build Term. For the quadratic or second-order polynomial we need to define
Time2 and we can specify this by clicking on _ to add a multiplication symbol to our

term, then selecting Time again and clicking on _. The Build Term bar should now read Time*Time (or, put another way, Time2). Click on _ to put it into the model. Finally, lets add the cubic trend. For the cubic or third-order polynomial we need to define Time3 (or Time*Time*Time). We build this term up in the same way as for the quadratic polynomial: select Time, click on _, click on _, select Time again, click on _, click on _ again, select Time for a third time, click on _, click on _. This should add the third-order polynomial (or Time*Time*Time) to the model. Click on _ to return to the main dialog box.

309

As in the chapter we expect the relationship between time and life satisfaction to have both a random intercept and a random slope. We need to define these parameters now by clicking on _ in the main dialog box. We specify our contextual variable by selecting
Person and dragging it to the area labelled Combinations (or click on _). To specify that

the intercept is random select _, and to specify random slopes for the effect of Time, click on this variable in the Factors and Covariates list and then click on _ to include it in the Model. Finally, we need to specify the covariance structure. As in the chapter, choose an autoregressive covariance structure, AR(1), and lets also assume that variances will be heterogeneous. Therefore, select _ from the drop-down list. Click on _ to return to the main dialog box. Click on _ and select _ and then click on _ and select Parameter estimates and Tests for covariance parameters. Click on _ to return to the main dialog box. To run the analysis, click on _.

310

The output is the same as the last output in the chapter except that it now includes the effect of Gender. To see whether Gender has improved the model we again compare the value of 2LL for this new model to the value in the previous model. We have added only one term to the model so the new degrees of freedom will have risen by 1, from 8 to 9 (again you can find the value of 8 in the row labelled Total in the column labelled Number of Parameters, in the table called Model Dimension). We can compute the change in 2LL as a result of Gender by subtracting the 2LL for this model from the -2LL for the last model in the chapter:
2 Change = 1798.86 1798.74 = 0.12

dfChange = 9 8 = 1

The critical values for the chi-square statistic for df=1 in the Appendix are 3.84 (p < .05) and 6.63 (p < .01); therefore, this change is not significant because 0.12 is less than the critical value of 3.84. The table of fixed effects and the parameter estimates tell us that the linear, F(1, 221.41) = 10.01, p < .01, and quadratic, F(1, 212.51) = 9.41, p < .01, trends both significantly described the pattern of the data over time; however, the cubic trend, F(1, 214.39) = 3.19, p > .05, does not. These results are basically the same as in the chapter. Gender itself is also not significant in this table, F(1, 113.02) = 0.11, p > .05. The final part of the output tells us about the random parameters in the model. First of all, the variance of the random intercepts was Var(u0j) = 3.89. This suggests that we were correct to assume that life satisfaction at baseline varied significantly across people. Also, the variance of the peoples slopes varied significantly Var(u1j) = 0.24. This suggests also
311

that the change in life satisfaction over time varied significantly across people too. Finally, the covariance between the slopes and intercepts (0.39) suggests that as intercepts increased, the slope decreased. These results confirm what we already know from the chapter. The trend in the data is best described by a second-order polynomial, or a quadratic trend. This reflects the initial increase in life satisfaction 6 months after finding a new partner but a subsequent reduction in life satisfaction at 12 and 18 months after the start of the relationship. The parameter estimates tell us much the same thing. As such our conclusions have been unaffected by including gender.

312

Task 3

Getting kids to exercise (Hill, Abraham, & Wright, 2007). The purpose of this

research was to examine whether providing children with a leaflet based on the theory of planned behaviour increases childrens exercise. There were four different interventions (Intervention): a control group, a leaflet, a leaflet and quiz, and a leaflet and plan; 503 children from 22 different classrooms were sampled (Classroom). It was not practical to have children in the same classrooms in different conditions, therefore the 22 classrooms were randomly assigned to the four different conditions. Children were asked On average over the last three weeks, I have exercised energetically for at least 30 minutes ______ times per week after the intervention (Exercise). Run a multilevel model analysis on these data (Hill et al. (2007).sav) to see whether the intervention affected the childrens exercise levels (the hierarchy in the data is: children within classrooms within interventions).

313

Here is a graph of the data; the big dots are means for the schools, the box plots are standard ignoring the structure

exercise + 0.5

1.0

1.5

2.0

2.5

3.0

Control

Leaftet

L+quiz

L+plan

Conditions

The data file looks like:

314

The analysis is done with the MIXED procedure by selecting ___. At the first screen you enter your level 2 variable in the subject box (Classroom). Remember: the SPSS MIXED procedure assumes that you are doing repeated-measures analysis of individuals.

315

After clicking on _ you enter the outcome variable (Exercise) and the predictor (Intervention).

You then have six buttons to enter the details of the analyses. Here we consider only _ and _. The _ screen allows you to enter the fixed part of the model. This is the condition the participant is in. Select the variable that specifies conditions (Intervention) and click on _:

316

The _ screen is where you can really take advantage of the procedure's flexibility. The model looked at here is one of the simpler multilevel models. Highlight Classroom in the Subjects box and put it into the Combinations box by clicking on _. This tells the computer that this is the cluster variable. By not entering any variables into the Model box the computer assumes that you just want a random intercept. The default choice of _ should be used for this example.

317

Now click on _ and select Tests for covariance parameters.

Click on _ on then_.

318

The first part of the output tells you details about the model that are being entered into the SPSS machinery. The Information Criteria box gives some of the popular methods for assessing the fit models. AIC and BIC are two of the most popular. The Fixed Effects box gives the information in which most of you will be most interested. It says the effect of intervention is non-significant, F(3,18.061) = 1.704, p = .202. A few words of warning: calculating a p-value requires assuming that the null hypothesis is true. In most of the statistical procedures covered in this book you would construct a probability distribution based on this null hypothesis, and often it is fairly simple, like the z- or t-distributions. For multilevel models the probability distribution of the null is often not known. Most
319

packages that estimate p-values for multilevel models estimate this probability in a complex way. This is why the denominator degrees of freedom is not a whole number. For more complex models there is concern about the accuracy of some of these approximations. Many methodologists urge caution rejecting hypotheses even when the observed p-value is less than .05.

The random effects shows how much of the variability in responses is associated with which class a person is in: .023777/(.023777 + .290766) = 7.56%. This is fairly small. A rough guide to whether this is greater than chance is obtained by dividing this value by its standard error to get the Wald z and seeing if it is greater than 1.96. It is slightly less (1.955). The significance of the Wald statistic confirms this: it just fails to reach the traditional level for statistical significance. The result from these data could be that the condition failed to affect exercise. However, there is a lot of individual variability in the amount of exercise people get. A better approach would be to take into account the amount of self-reported exercise prior to the study as a covariate.
Task 4

Repeat the above analysis but include the pre-intervention exercise scores (Pre_Exercise) as a covariate. What difference does this make to the results?

320

This can be done by repeating the procedure in Task 1 but including Pre_Exercise in the covariate box.

Then click on _, select the variables that specify conditions (Intervention) and preintervention exercise (Pre_Exercise) and then click on _:

The other options can be kept the same as in the previous task. The new estimates for the fixed effects are:
321

Now, after taking into account initial exercise, the condition is statistically significant, F(3,18.539) = 6.636, p = .003.

322

The literature review in research


It has become an annual ritual for graduate researchers embarking on their projects to ask about the literature review. They usually want to know what a review of the literature looks like and how they should do one. Students and tutors nd that there is no single text that can be used to guide them on how to conduct the literature review; hence the purpose of this book. It is a guide to reviewing literature for research. The book, however, is not about reviewing or critical evaluation of the kinds of articles found in the review sections of newspapers such as

Times Educational Supplement

or

Guardian.

The

It is about reviewing a research

literature. It introduces and provides examples of a range of techniques that can be used to analyse ideas, nd relationships between different ideas and understand the nature and use of argument in research. What you can expect, therefore, is explanation, discussion and examples on how to analyse other people's ideas, those ideas that constitute the body of knowledge on the topic of your research. Initially we can say that a review of the literature is important because without it you will not acquire an understanding of your topic, of what has already been done on it, how it has been researched, and what the key issues are. In your written project you will be expected to show that you understand previous research on your topic. This amounts to showing that you have understood the main theories in the subject area and how they have been applied and developed, as well as the main criticisms that have been made of work on the topic. The review is therefore a part of your academic development of becoming an expert in the eld. However, the importance of the literature review is not matched by a common understanding of how a review of related literature can be done, how it can be used in the research, or why it needs to be done in the rst place. Undertaking a review of a body of literature is often seen as something obvious and as a task easily done. In practice, although research students do produce what are called reviews of the literature, the quality of these varies considerably. Many reviews, in fact, are only thinly disguised annotated bibliographies. Quality means appropriate breadth and depth, rigour and consistency, clarity and brevity, and effective analysis and synthesis; in other words, the use of the ideas in the literature to justify the particular approach to the topic, the selection of methods, and demonstration that this

Doing a literature review

research contributes something new. Poor reviews of a topic literature cannot always be blamed on the student researcher. It is not necessarily their fault or a failing in their ability: poor literature reviews can often be the fault of those who provide the education and training in research. This book has been written primarily for student researchers, although it may also be of use to those who provide education and training in research. It is intended to be an introduction to those elements of the research process that need to be appreciated in order to understand the how and why of reviewing a topic-specic literature. As such, an attempt has been made to provide an introduction to a range of generic techniques that can be used to read analytically and to synthesize ideas in new and exciting ways that might help improve the quality of the research. This book is aimed at people working within the social sciences, which includes the disciplines listed below. This list is not exhaustive; archaeology, for instance, might have been included in this list. built environment and town planning business studies communication and media studies community studies cultural studies economic and social history economics educational studies environmental studies gender studies human geography literature organizational studies policy analysis political studies psychology religious studies social and political theory social anthropology social policy and administration social research sociology

The main aim of this book is therefore to provide researchers with a set of ground rules, assumptions and techniques that are applicable for understanding work produced in the whole range of disciplines that make up the social sciences. The assumptions outlined in the book form a basis for the understanding and cross-fertilization of ideas across disciplines. The various techniques aim to provide the tools for a systematic and rigorous analysis of subject literature. Suggestions are also made on writing up the analysis of ideas in ways that can give clarity, coherence and intelligibility to the work. This chapter will introduce you to the skills needed for research, the place of the literature review in research and the importance of the review to master's and doctoral study. In Chapter 2 we look at the purpose of the review in research and what is meant by the research imagination. Chapter 3 examines the types of research to be found in the literature, together with examples of reviews undertaken in a range of subject areas. It also shows examples of good practices that you should be able to adapt and utilize in your own work, especially in reading to review. Chapter 4 is about understanding arguments. To analyse a literature on a topic necessarily involves understanding the standpoint (moral and ethical) and perspective (political

The literature review in research

and ideological) an author has used. Chapters 5 and 6 are about the tools and techniques of analysis and synthesis. Essential techniques such as analysing an argument, thinking critically and mapping ideas are explained. A thread running through these chapters is guidance on how to manage information. This is because, without strict management of materials and ideas, any thesis will lack the technical standards required of the postgraduate student. The nal chapter is about writing up your review of the literature. Guidance is given on how your review can be used to justify your topic as well as on what structures and formats might be used.

SKILLS FOR RESEARCH IN THE SOCIAL SCIENCES


The breadth and depth of the various subject disciplines that make up the social sciences, some of which have been listed above, are not easily classied. There are also the increasing opportunities for students to study a range of modules which cut across different areas of knowledge. Combined with these is the pace of development of the electronic systems being used increasingly in all types of research.

Adapting to change
The expansion of education has been accompanied by a massive and growing expansion of information available to research students. In printed and electronic form the pace of information generation continues to increase, resulting in libraries acquiring only a very small proportion of that available. As a consequence, many academic libraries have become gateways to information rather than storehouses of knowledge. You will nd that nearly all university libraries and public libraries are able to serve your needs as a researcher. The move of university libraries away from storehouses of knowledge towards information resource centres has been accompanied by an increase in the use of information technology (IT). Many libraries manage the expansion of information with the aid of computer systems able to communicate around the globe a development which has opened up a range of new possibilities to researchers. It is now possible for you to access information that would previously have been difcult and expensive to nd. A single day searching a CD-ROM database or the internet can throw up many more sources than might have been found from weeks of searching through printed abstracts and indexes. However, there are two problems you may encounter in this area. One is the lack of understanding of technology and how it can be used in research. The other is a lack of understanding about how knowledge is generated and organized through the use of tools, such as abstracts and indexes, in order to make it accessible.

4
INFORMATION is generated from

Doing a literature review


RESEARCH is conducted by associations commerce institutions pressure groups charities individuals government unions

critical evaluations interpretative work research

communicated via anthologies conference papers journals lectures letters newspapers reports seminars textbooks meetings theses newsletters

organized in abstracts bibliographies catalogues dictionaries directories encyclopaedias indexes

accessed through electronic media hardcopy

Figure 1.1 The generation and communication of research knowledge and information

Figure 1.1 provides an overview showing the main sources of knowledge and the tools by which most of it is organized for retrieval. More recently there has been a move in higher education and research to learn from other disciplines, to be cross-disciplinary. Students on social studies and humanities courses are expected to undertake training in computing and to become competent in the use of statistical techniques, employing computers for data analysis and presentation. Added to this is the trend towards combined degrees. A consequence is that researchers need to be more exible in their attitude to knowledge. To do this they need much broader skills and knowledge bases to take full advantage of higher education. The changing requirements placed on the student have begun to manifest themselves in a terminology of skills, competencies and professional capabilities. Alongside a traditional education, students are expected to acquire a set of personal transferable skills. The basic elements of communication, such as writing reports, making presentations and negotiating,

The literature review in research

might be included in these skills. The emphasis on skills is not something unique to a social sciences education skills are becoming important to the careers of graduates and to quality research in general. Undergraduate and postgraduate research is an ideal opportunity for such personal transferable skills to be acquired and developed. Although searching and reviewing a literature do not cover the whole spectrum of skills, they do cover some key ones. These include: time management, organization of materials, computer use, information handling, on-line searching and writing.
The research apprenticeship

It is not an easy matter to demonstrate the kinds of skills and abilities expected of a competent researcher in the report of the research. The skills required are considerable and are increasingly subject to detailed evaluation. As the opportunities to undertake research have expanded, so too has the demand for better and improved education and training for researchers. In its response to these demands the Economic and Social Research Council (ESRC) in the UK produced a set of guidelines which include a number of basic proposals for research training which are intended to promote quality research. The following list indicates the two basic types of skills required from researchers. skills and Core abilities: to integrate Ability theory and method: while the differences make subject disciplines distinctive, there exists a common core of skills and attitudes which all researchers should possess and should be able to apply in different situations with different topics and problems. research for all disciplines involves an understanding of the interrelationship between theory, method and research design, practical skills and particular methods, the knowledge base of the subject and methodological foundations.

Both of these proposals call for a research training that exposes the apprentice to the range of general academic research skills and expertise expected of a professional researcher. The academic skills and expertise common to all subject elds within the social sciences can be grouped as shown in Table 1.1 (overleaf). In addition to the common academic skills the ESRC guidelines also identify subject-specic skills, abilities and knowledge to be expected of postgraduate students. Examples of these for two subject areas, linguistics and sociology, can be seen in Table 1.2 (p. 7).

6
Literature search and evaluation

Doing a literature review

Table 1.1 Research areas for the application of skills and abilities
For example: library searching and use of abstracts and indexes; bibliographic construction; record keeping; use of IT for wordprocessing, databases, on-line searching and electronic mail; and techniques for the evaluation of research, including refereeing, reviewing and attribution of ideas. For example: formulation of researchable problems and translation into practicable research designs; identifying related work to rationalize the topic and identify a focus; organize timetables; organize data and materials; understand and appreciate the implications of different methodological foundations; and how to deal with ethical and moral considerations which may arise. For example: planning writing; skills for preparing and submitting papers for publication, conferences and journals; use of references, citation practices and knowledge of copyright; construction and defence of arguments; logical, clear and coherent expression; and understanding of the distinction between conclusions and recommendations.

Research design and strategy

Writing and presenting

It is important that research education and training does produce researchers who are competent and condent in a range of skills and capabilities and who have an appropriate knowledge base. An element common to the core areas is a thorough understanding of information. This means that as a researcher you need to become familiar with: accessing and using the vast resources of academic, public and commercial libraries in the world, through, for example, JANET (Joint Academic Network), OPAC (On-line Public Access Catalogues) and the British Library; keeping accurate records and establishing reliable procedures to manage materials; applying techniques to analyse bodies of literature and synthesize key ideas; and writing explicit reviews which display depth and breadth and which are intellectually rigorous. All these are part of the essential transferable skills of the researcher. Most disciplines introduce their students to the theoretical and historical traditions that give shape and distinctiveness to the subject knowledge. But in so doing the methodological bias, disciplinary boundaries and misunderstanding about other subjects is perpetuated. This often creates barriers to cross-disciplinary studies and a lack of appreciation of alternative ways of researching and understanding the world. This book aims to show ways in which these kinds of barriers can be overcome and we begin by considering what we mean by scholarship.

Scholarship
Most people are capable of doing a piece of research but that capability has to be acquired for instance, you cannot simply write a questionnaire as if you are writing a shopping list. A sound knowledge of the whole research

Table 1.2 ESRC guidelines on subject knowledge and skills: linguistics and sociology
Core training
LINGUISTICS Philosophy of linguistics

Descriptions of skills and abilities expected from the research student


Issues of theory construction, problem formulation, and explanation; basic themes e.g., realism, mentalism, nominalism, empiricism, behaviourism and logicism; ontological and epistemological issues; status of data and use of informant judgements; role of formalism; argumentation and status of examples; relationship between theory and data; search for universals; ideological implications of idealization; cultural partiality; Kuhnian paradigms.

SOCIOLOGY

Philosophy of the social sciences

Understanding of the major alternative philosophical positions for theory construction, appraisal and testing, for explanatory goals of theories and for the use of models. Understanding of how various positions affect research design, research choices, data-collection and analysis techniques. Understanding of the theoretical context of research; theoretical issues and debates for those engaged in empirical work; and evaluation of research. Stages and processes in formulating researchable problems and translating them into practical research designs. Making informed judgements about ethical and moral issues. Understanding of the uses and implications of: experimental study; survey research; comparative studies; longitudinal research; ethnography; case studies; replication studies; evaluation research; prediction and action research. Awareness of range of sources, e.g., archival and historical data; agency records; ofcial statistics; pictorial materials; and textual data. Knowledgeable of data-collection techniques by participant and non-participant observation, ethnographic eld work, group discussion, various types of interviews and questionnaires, and through unobtrusive measures. Methods of recording data such as note taking, audio and video; data coding and identifying relationships between concepts/variables; the principles of descriptive and inferential statistics and bi- and multivariate analysis; the systematic analysis of textual data and other qualitative materials; and use of computer packages for data management.

Research design

Data collection and analysis

hcraeser ni weiver erutaretil ehT

Research methods in linguistics

Qualitative methods use of informants; audio and video recording; phonetic and orthographic transcription; descriptive linguistics (diachronic and conversation analysis). Computational methods use of linguistic corpora grammar systems; speech workstation; phonological and morphological analysis; basic programming in high-level language, e.g., Prolog. Formal methods mathematical linguistics (set, string, tree, grammar, equivalence, hierarchy, lambda calculus); theory of inferences and semantics of rst-order logic; feature structures and unication. Quantitative methods experimental design; validity and conduct of experiments; questionnaires; interviewing; sampling and survey design; statistics software; descriptive and inferential statistics.

Doing a literature review

process is required and you need to understand where data collection ts into the global picture of what you are doing. This means knowing how to state the aims and objectives of your research, dene your major concepts and methodological assumptions, operationalize (put into practice) those concepts and assumptions by choosing an appropriate technique to collect data, know how you are going to collate results, and so on. Competent research therefore requires technical knowledge. There is, however, a difference between producing a piece of competent research and a piece of research that demonstrates scholarship. Scholarship is often thought to be something academic high-brow types do. We are all familiar with the popular image of the scholar as one of an ageing bespectacled man with unkempt hair, dressed shabbily in corduroy with a thick old leather-bound book in hand. Many of you may be aware of places of scholarship epitomized in television programmes such as Morse and in novels such as Brideshead Revisited. The surreal surroundings of the Oxbridge colleges, with their high towers, the oak-clad library full of books and manuscripts, and with the smell of dust and leather, are common images of scholarly places. Many universities do have traditional oak-clad libraries, but many others today do not. It is more common for universities to have modern well equipped learning resource centres brimming with technology, than to have rows of books on shelves. Scholarship is an activity: it is something a person can do. You do not have to be of a certain social class, gender, ethnic origin or to have successfully jumped over formal educational hurdles. We can say that scholarly activity encompasses all of these and more. Scholarly activity is about knowing how to: do competent research; read, interpret and analyse arguments; synthesize ideas and make connections across disciplines; write and present ideas clearly and systematically; and use your imagination. Underpinning all of these are a number of basic ground rules, which we look at in more detail in the next section. But what they amount to is an attitude of mind that is open to ideas and to different styles and types of research, and is free of prejudices about what counts as useful research and what type of person should be allowed to do research. A key element that makes for good scholarship is integration. Integration is about making connections between ideas, theories and experience. It is about applying a method or methodology from one area to another: about placing some episode into a larger theoretical framework, thereby providing a new way of looking at that phenomenon. This might mean drawing elements from different theories to form a new synthesis or to provide a new insight. It might also mean re-examining an existing body of knowledge in the light of a new development. The activity of scholarship is, therefore, about thinking systematically. It might mean forcing new typologies onto the structure of knowledge or onto a taken-for-granted perspective. Either way, the scholar endeavours to interpret and understand. The intent is to make others think about and possibly re-evaluate what they have hitherto taken to be unquestionable knowledge. Therefore,

The literature review in research

systematic questioning, inquiring and a scrutinizing attitude are features of scholarly activity. At master's level, this might mean looking at applying a methodology in ways not tried before. At doctoral level, it might mean attempting to regure or respecify the way in which some puzzle or problem has traditionally been dened. The anthropologist Clifford Geertz (1980: 1656) suggests that reguration is more than merely tampering with the details of how we go about understanding the world around us. He says reguration is not about redrawing the cultural map or changing some of the disputed borders, it is about altering the very principles by which we map the social world. From the history of science, for example, Nicolas Copernicus (1473 1543) re-examined theories about the cosmos and the place of the earth within it. Traditional theory held the view that the earth was motionless and stood at the centre of the universe: the sun, other stars and planets were believed to revolve around the earth. Copernicus asked himself if there was another way of interpreting this belief. What if, he asked, the sun was motionless and the earth, planets and stars revolved around it? In 1541 he outlined his ideas and there began a reguration of how the cosmos was mapped. We can see a classic example of reguration in the work of Harold Garnkel. Garnkel respecied the phenomena of the social sciences, especially sociology (see Button, 1991). He undertook a thorough scrutiny of traditional sociological theory and found that social science ignored what real people do in real situations; the result was that he originated the technique of ethnomethodology. So radical was this respecication that traditional social science has marginalized the work of Garnkel and others who undertake ethnomethodological studies of social life.

SKILLS AND THE LITERATURE REVIEW


The researcher, at whatever level of experience, is expected to undertake a review of the literature in their eld. Undergraduates researching for a thesis or dissertation are expected to show familiarity with their topic. Usually this takes the form of a summary of the literature which demonstrates the skills to search on a subject, compile accurate and consistent bibliographies and summarize key ideas showing a critical awareness. They are expected to weigh up the contribution that particular ideas, positions or approaches have made to their topic. In short, they are required to demonstrate, on the one hand, library and information skills, and on the other, the intellectual capability to justify decisions on the choice of relevant ideas and the ability to assess the value of those ideas in context. Undergraduates who move on to postgraduate research nd that expectations change. The scope, breadth and depth of the literature search increases. The research student is expected to search more widely, across disciplines, and in greater detail than at undergraduate level. The amount

10

Doing a literature review

of material identied increases the amount of reading the researcher has to do. In addition, reading materials across several disciplines can be difcult because of the different styles in which various disciplines present ideas. Also, the vocabularies of different subjects and what are taken to be the core, researchable problems for a particular discipline constitute further difculties. For example, the student of management may be totally unfamiliar with the verbose and seemingly commonsense style of, say, sociology. Conversely, they may nd the going less difcult if faced with advanced social statistics. The result may be the dismissal of the verbose style and admiration of the numerical formulae. The acceptance of one style over another is often due to disciplinary compartmentalization. Management students might be expected to be more familiar with statistics than with social theories. They might also have a more pragmatic attitude, inuencing them to favour clarity and succinctness. As a consequence, potentially interesting and relevant ideas might be missed. Our discussion so far has been about the kinds of assumptions that might help overcome disciplinary compartmentalization and so encourage cross-disciplinary understanding. In practice, this addresses two main features of academic research: one is the central place argument has in academic work, and the other is the need to be open-minded when reading the work of other people. We look more closely now at each of these in turn.

Communicating your argument


Most authors attempt to make their writing clear, consistent and coherent something very difcult to achieve in any work, whatever its length or topic. Nevertheless, clarity, consistency and coherence are essential, because without them a work can be unintelligible. As a consequence the work might be misunderstood, dismissed or used in ways not intended by the author. Most important, the main idea, no matter how interesting, might be lost. Conversely, what seems clear and coherent to the writer can be utterly incomprehensible to the reader. Unfamiliarity with the style, presentation or language use is nearly always a cause of frustration to the reader. We need to acknowledge that effort is required and to accept that clarity, consistency and coherence are not mysterious qualities able to be practiced only by the few. These can be achieved through explicit expression in writing and explicit commitment in reading. A problem for the academic author, however, is the time that readers allocate to their reading and the level of effort they are willing to invest in order to grasp the ideas in a text. At the same time, some authors seem to neglect the needs of their potential readers and manage to make relatively simple ideas confusing. In terms of reviewing a body of literature made up of dozens of articles, conference papers and monographs one problem is diversity. Texts which originate from several disciplines and which have been written in different styles engender the need for a exible and open-

The literature review in research

11

minded attitude from the reviewer. Added to this, there is often a lack of explicitness: it is rare to nd an account of a piece of research that systematically lays out what was done, why it was done and discusses the various implications of those choices. The reviewer needs to appreciate some of the reasons for the lack of explicitness. First, it takes considerable effort and time to express ideas in writing. Secondly, limitations placed on space or word counts often result in editing not deemed ideal by the author. Also, being explicit exposes the research (and researcher) to critical inspection. Presumably, many able researchers do not publish widely so as to avoid such criticism.
The need for open-mindedness

As we saw earlier, competence in reading research is not easily acquired. It is a part of the process of research training and education. It takes time and a willingness to face challenges, acquire new understandings and have sufcient openness of mind to appreciate that there are other views of the world. This begins by recognizing that the reviewer undertakes a review for a purpose and an author writes for a purpose. While an author may not always make their ideas clear, consistent and coherent, the reviewer is required to exercise patience when reading. The reviewer needs to assume (no matter how difcult the reading) that the author has something to contribute. It is therefore important to make the effort to tease out the main ideas from the text under consideration. It also means making the effort to understand why you are having difculty in comprehending the text. This means not categorizing the text using prejudicial perceptions of the subject discipline, but instead placing the research in the context of the norms of the discipline and not judging it by the practices of the discipline with which you are most familiar. As a part of this attitude the researcher needs to exercise a willingness to understand philosophical (or methodological) traditions. The choice of a particular topic, together with the decision to research it using one specic strategy rather than another and to present it in a certain style, are design decisions often based on prior commitments to a view of research. An individual piece of research can therefore be placed, in general terms, in an intellectual tradition such as positivism or phenomenology. But the reviewer needs to take care not to criticize that research purely on general terms and especially from different standpoints. The different intellectual traditions need to be appreciated for what they are and not for what they are assumed to lack from another standpoint. This can be illustrated with a brief example. Many social science students will have come across ethnomethodology, but apart from a few notable exceptions, ethnomethodology is quickly passed over in most programmes of study. We have found, from experience, that this is often due to the extreme difculty of understanding what ethnomethodology is

12

Doing a literature review

about and how to do an ethnomethodological study. An example from the work of the founder of ethnomethodology, Harold Garnkel, illustrates this point. This is the title from a recent article by him: `Respecication: evidence for locally produced, naturally accountable phenomena of order, logic, reason, meaning, method, etc. in and as of the essential haecceity of immortal ordinary society (I) an announcement of studies' (Garnkel, 1991: 10). Those unfamiliar with ethnomethodology might now appreciate the difculties in merely understanding what Garnkel is trying to say. But there are two very relevant points here. The rst is that tenacity is required to understand an approach such as ethnomethodology. Simply because Garnkel's work is not instantly recognizable as sociology is not sufcient reason to dismiss it. Secondly, Garnkel's ideas might be important if they are dismissed because the reader is not willing to invest time and effort, then an important opportunity for learning might be missed. The only way to become competent enough to comment on complex ideas, such as those proposed by Garnkel, is to read the works of the theorist and follow through what is said. The assumptions discussed in this section are the basis for later chapters. Collectively what they amount to is an operationalization of scholarship and good manners in research. They also signpost the need for reviewers of research to be informed about, and to be able to demonstrate awareness of, the different styles and traditions in research.

THE ROLE OF THE LITERATURE REVIEW


The product of most research is some form of written account, for example, an article, report, dissertation or conference paper. The dissemination of such ndings is important because the purpose of research is to contribute in some way to our understanding of the world. This cannot be done if research ndings are not shared. The public availability of research ndings means that accounts of research are reconstructed `stories' those serendipitous, often chaotic, fragmented and contingent aspects of most research (the very things that make research challenging!) which do not nd their way into the formal account. We therefore need to get an initial understanding of what the role of the literature review is and where it ts into the thesis or dissertation. The structure of the formal report for most research is standardized and many of the sections found in a report are also found in a proposal for research (see Table 1.3 overleaf ). The full arrangement for the research proposal is shown in Appendix 1. Within this arrangement the author of the account usually employs a range of stylistic conventions to demonstrate the authority and legitimacy of their research and that the project has been undertaken in a way that is rigorous and competent.

The literature review in research


Table 1.3 Some sections commonly found in both a research proposal and report

13

Section
Introduction

Aim
To show the aims, objectives, scope, rationale and design features of the research. The rationale is usually supported by references to other works which have already identied the broad nature of the problem. To demonstrate skills in library searching; to show command of the subject area and understanding of the problem; to justify the research topic, design and methodology. To show the appropriateness of the techniques used to gather data and the methodological approaches employed. Relevant references from the literature are often used to show an understanding of datacollection techniques and methodological implications, and to justify their use over alternative techniques.

Literature review

Methodology

From Table 1.3 you can see that the review of related literature is an essential part of the research process and the research report it is more than a just stage to be undertaken or a hurdle to be overcome. Figure 1.2 (p. 14) shows some of the questions that you will be able to answer from undertaking a literature review on your topic. The literature review is integral to the success of academic research. A major benet of the review is that it ensures the researchability of your topic before `proper' research commences. All too often students new to research equate the breadth of their research with its value. Initial enthusiasm, combined with this common misconception, often results in broad, generalized and ambitious proposals. It is the progressive narrowing of the topic, through the literature review, that makes most research a practical consideration. Narrowing down a topic can be difcult and can take several weeks or even months, but it does mean that the research is more likely to be completed. It also contributes to the development of your intellectual capacity and practical skills, because it engenders a research attitude and will encourage you to think rigorously about your topic and what research you can do on it in the time you have available. Time and effort carefully expended at this early stage can save a great deal of effort and vague searching later.

Denition: Literature review


The selection of available documents (both published and unpublished) on the topic, which contain information, ideas, data and evidence written from a particular standpoint to full certain aims or express certain views on the nature of the topic and how it is to be investigated, and the effective evaluation of these documents in relation to the research being proposed.

14

Doing a literature review


What are the key sources? What are the major issues and debates about the topic? Literature search and review on your topic What are the political standpoints? What are the main questions and problems that have been addressed to date? How is knowledge on the topic structured and organized? What are the key theories, concepts and ideas? What are the epistemological and ontological grounds for the discipline?

What are the origins and definitions of the topic?

How have approaches to these questions increased our understanding and knowledge?

Figure 1.2 Some of the questions the review of the literature can answer

REVIEWING SKILLS AND THE POSTGRADUATE THESIS

A major product of academic programmes in postgraduate education is the thesis. This section will look at the place of the literature review in relation to the thesis. It will attempt to outline some of the dimensions and elements that provide evidence for assessing the worthiness of a thesis. Whereas undergraduate projects are often assessed according to pro forma marking schedules, a postgraduate thesis is assessed for its worthiness and the literature review plays a major role in the assessment. A problem, however, is saying just what constitutes an undergraduate dissertation or project and how this differs from, say, a master's thesis, although this is not the place to look closely at this question. Table 1.4 (p. 15) provides a summary of the function and format of the literature review at these different levels. Note that the main concern is not only to satisfy assessors but to produce a competent review of a body of literature. The two descriptions that follow are not intended to be read as separate criteria for a master's and for a doctorate. Rather, they are intended to be read as guides to what might be expected from postgraduate research. We begin with the master's, which also gives the necessary prerequisite skills for a doctorate.
The master's

What we will focus on here is the skills element necessary for the master's thesis. If we take research for a master's thesis as being a signicant piece

The literature review in research


Table 1.4 Degrees and the nature of the literature review

15

Degree and research product


BA, BSc, BEd Project MA, MSc, MPh Dissertation or thesis

Function and format of the literature review in research at these levels


Essentially descriptive, topic focused; mostly indicative of main, current sources on the topic. Analysis is of the topic in terms of justication. Analytical and summative, covering methodological issues, research techniques and topics. Possibly two literature-based chapters, one on methodological issues, which demonstrates knowledge of the advantages and disadvantages, and another on theoretical issues relevant to the topic/problem. Analytical synthesis, covering all known literature on the problem, including that in other languages. High level of conceptual linking within and across theories. Summative and formative evaluation of previous work on the problem. Depth and breadth of discussion on relevant philosophical traditions and ways in which they relate to the problem.

PhD, DPhil, DLitt Thesis

of investigative work, then the following opportunities (or educational aims) are embodied in that investigation. 1 An opportunity is provided for the student to design and carry out a substantial piece of investigative work in a subject-specic discipline. The review of related and relevant literature will be very important to the research whether in the eld or from a desk. 2 An opportunity is provided to take a topic and, through a search and analysis of the literature, focus it to a researchable topic. This puts to the test the student's ability to search for and manage relevant texts and materials and to interpret analytically ideas and data. 3 An opportunity is provided for the student to recognize the structure of various arguments and to provide cogent, reasoned and objective evaluative analysis. This puts to the test the ability to integrate and evaluate ideas. As the product of your time and research the master's thesis (which at master's level may also be called the dissertation) is a learning activity. The intent of the activity is that you acquire a range of skills at an appropriate level that are related to doing capable and competent research. The thesis is the evidence that you have acquired the necessary skills and can therefore be accredited as a competent researcher. The kinds of skills needed are those associated with research design, data collection, information management, analysis of data, synthesis of data with existing knowledge and evaluation of existing ideas along with a critical evaluation of your own work. We will look at these important points in more detail in a moment. Remember that your thesis is the only opportunity you will have to demonstrate your ability to apply these skills to a particular topic: this

16

Doing a literature review

demonstration is the thesis. So, the thesis should be coherent and logical, and not a series of separate and inadequately related elements. There should be clear links between the aims of your research and the literature review, the choice of research design and means used to collect data, your discussion of the issues, and your conclusions and recommendations. To summarize, we can say that the research should: 1 focus on a specic problem, issue or debate; 2 relate to that problem, issue or debate in terms that show a balance between the theoretical, methodological and practical aspects of the topic; 3 include a clearly stated research methodology based on the existing literature; 4 provide an analytical and critically evaluative stance to the existing literature on the topic. A master's thesis is therefore a demonstration in research thinking and doing. It is intended to show that the student has been capable of reasoning over which methodological approach to employ. It is also a demonstration on how to operationalize key concepts of methodology through the use of a range of data-collection techniques. There are, then, a range of skills that often form the basis for the criteria on which a master's thesis is assessed. Table 1.5 (overleaf ) provides an overview of the criteria normally used for assessing the worthiness of a master's thesis and it also shows how an excellent piece of work can be distinguished from a poor one. It may be useful, at this stage, to say a little more about some of the general skills and capabilities. Here we have picked out four that are very important and which require special attention by the research student.
Prior understanding You will be expected to demonstrate a sufcient

level of prior understanding of the topic and methodology. The focus for these is usually in the literature review and chapter on methodology. The latter, is, of course, often heavily dependent on the use of the literature dealing with methodology. Therefore, if your main methodology was survey based you would be expected to show familiarity with the literature on surveys. This might involve critical appraisal of key works that advocate a positivistic approach to research, identifying core authors and relevant studies as exemplars to justify your choice of approach. This involves the construction of an argument. The literature will help you to provide evidence and substance for justifying your choice. At the same time you will become familiar with the literature on the methodology and be able to show this in your thesis.
Perseverance and diligence You will not normally nd all the information you require in a few weeks. You will therefore need to be persistent in

Table 1.5 Criteria for assessing a master's dissertation


Excellent and distinctive work Aims, objectives and justication Clear aims able to be operationalized. Explanation of the topic with succinct justication using the literature. Shows full awareness of the need to focus on what is able to be done. Choice of methodology explained in comparative terms showing considerable evidence of reading and understanding. Overall research design abundantly clear and logical for the student to apply. Strengths and weaknesses in previously used methodologies/data-collection techniques are recognized and dealt with. Thorough review of the relevant literature; systematically analysed and all main variables and arguments identied. Critical evaluation rmly linked to justication and methodology. Competent work Clear aims and objectives. Acceptable justication with identication of the topic. Signicantly decient work Aims and objectives unclear due to no logical connections between them. Insufcient attempt to justify the topic. Actual topic not clear due to lack of focus. No explanation of the methodology, its choice or appropriateness for the research. No indication of reading on methodology or data-collection techniques, so no demonstration of ability to collect data in a systematic way. No overall research design.

Methodology and data collection

Methodology described but not in comparative terms; so no explanation given for choices; nevertheless, an appropriate methodology employed. Research techniques clear and suitable for the topic. May have replicated weaknesses or bias inherent in previous work on the topic. Review of the main literature with main variables and arguments identied. Some links made to methodology and justication.

Literature review and evaluation

No review of the literature; annotations of some items but no attempt at a critical evaluation, therefore no arguments or key variables identied relevant to the topic. No bibliography or too large a bibliography to have been used.

continued overleaf

71

hcraeser ni weiver erutaretil ehT

Table 1.5 (continued )


Excellent and distinctive work Style and presentation, including the use of graphic materials Clear and cohesive structure. Very well presented with accurate citations and bibliography. Impressive use of visual and graphic devices, and effective arrangement of materials. Accurate and proper use of English, employing scholarly conventions. Systematic and considered approach; critically reexive; clarity and logic in the structuring of argument; proper use of language; assumptions stated; charity of interpretation; identication of gaps and possibility for further research. Of a publishable standard. Competent work Clear structure and arrangement of materials with accurate citations, appropriate use of visual and graphic devices. Signicantly decient work Structured presentation but very thin on substantive content. Citations mostly correct but not consistent. Little evidence of thought about the use of visual or graphic devices. Sloppy use of language. Not a considered approach therefore no planning evident. Poor use of technical terms and overuse of . No argumentative structure cliche evident. Some attempt at interpretation, but not based on the data.

Overall coherence and academic rigour

Considered approach; clarity in the structure of presentation; satisfactory use of language; assumptions mostly stated, though some implicit; conclusions and ideas for further research identied.

weiver erutaretil a gnioD

81

The literature review in research

19

your work. This is especially the case with the search of the literature. Initial search strategies may not reveal what you might have wanted; you therefore need to be exible and search more widely or use more complex combinations of words and phrases. Persistence also means being thorough in your search; by making detailed records of how you managed the administration of the activity. This is because a comprehensive search for the literature on a topic is very much a matter of managing the administration of search sheets, records, databases, references located, items obtained and those ordered from the library, and so on. The use of all relevant sources and resources is therefore required to be shown in your thesis. This can be written up in the methodology chapter or the review of the literature. A major requirement is that you provide sufcient argument to justify the topic for your research which means showing that what you propose to research is worthy of research. This involves the use of existing literature to focus on a particular context. The context might be, for example, methodological, in that you propose to employ a methodology on a topic in an area in which it has not previously been used. This might involve constructing an argument to show how a methodology relates to the topic and thereby suggest what its potential might be. Alternatively, you might provide a summative or integrative review. This would involve summarizing past research and making recommendations on how your research will be an addition to the existing stock of evidence. In this case you would be proposing to apply a tried approach to your topic. Whatever you use as the focus for your justication one thing must always be seen: evidence from the literature. You are therefore expected to avoid using personal opinions and views and never submit a statement without sufcient backing.
Justication

You are required to use the literature in a way that is proper. At the most basic level this means citing references in a standard format recognized by the academic community. You will nd guidance on this in Appendix 2. It also means using the literature in a way that is considered and considerate. You might not be able to cite all the references that you locate in your search. You will therefore need to exercise judgement as to which references are the most important, that is, the most relevant to your purpose. An attitude of critical appraisal will be necessary to avoid simplistic summative description of the contents of articles and books. This involves being charitable to the ideas of others while at the same time evaluating the usefulness of those ideas to your own work. The master's is a limited piece of research. Taking approximately 10,000 to 15,000 words, the thesis or dissertation is a relatively modest piece of writing equivalent to, say, three or four extended essays. Its key elements are: the research; design of the research; application of data-collection techniques; management of the project and data; and interpretation of the
Scholarly conventions

20

Doing a literature review

ndings in the context of previous work. To do these things in a way that is scholarly demands effective management of the research. A summary of the standards required is given in Appendix 3.
The doctorate

There appear to be seven main requirements, generally agreed across the academic profession, covering the content, process and product of a doctoral thesis. These are: 1 2 3 4 5 specialization in scholarship; making a new contribution to an area of knowledge; demonstrating a high level of scholarship; demonstrating originality; the ability to write a coherent volume of intellectually demanding work of a signicant length; 6 the ability to develop the capacity and personal character to intellectually manage the research, including the writing of the thesis; 7 showing in-depth understanding of the topic area and work related to the research. We might also add an eighth criterion; one more specic to the doctoral viva: 8 defending orally what was produced in terms of the reason for doing the research and choices over the way it was done. These statements do not capture the scope and depth of all doctoral research. They do, however, provide a set of requirements which show the crucial importance of the literature review in the research process and in the content of the thesis itself. The rst three show the input that can be made by a thorough search and reading of related literature. It is these, together with demonstrating originality, that will now be discussed.
Specialization Although some universities allow candidates to enroll for higher degrees without a rst degree the model used here assumes an academic career in which scholarship is developmental and not conveyed through a title. That career normally consists of a rst degree followed by postgraduate work, both of which can be full-time or part-time study. Through an academic career a student gradually acquires a cumulative range of skills and abilities, and focuses their learning on a subject-specic knowledge. The availability of choice of degree and options within degrees means that subject specialization of some form is inevitable. In terms of skills and ability most undergraduates are expected to acquire and develop a wide range of personal transferable skills. Figure A4.1 in Appendix 4 gives an indication of the information management task involved in

The literature review in research


manage the technical elements of the review.

21

reviewing a literature; Appendix 4 also includes guidance on how to The raw materials, for undergraduate work, commonly in the form of articles from journals, periodicals, anthologies and monographs, are the ideas of other people, usually the `founding theorists' and `current notables' of the discipline. In order to understand the specics of the subject, it is essential that the undergraduate comes to terms with the ideas of the founding theorists and current notables. Only when they have done this will they have sufcient subject knowledge to be able to talk coherently about and begin to analyse critically the ideas of the subject. This means demonstrating comprehension of the topic and the alternative methodologies that can be used for its investigation. While it might be possible to reach a level of advanced standing without an appropriate intellectual apprenticeship, the academic career is likely to be a more reliable method of acquiring the in-depth knowledge demanded of a doctoral student. There are sound academic reasons for the academic career as preparation for higher degrees research. The ability and capacity to manage cognitively massive amounts of information, play with abstract ideas and theories and have insights is usually gained through intensive academic work and not short-term, drop-in programmes, or the production of occasional publications.

relates to what is said here, that the requirement for postgraduate research to advance understanding through making a new contribution, is directly dependent upon knowledge of the subject. That knowledge can only be obtained through the work and effort of reading and seeking out ways in which general ideas have been developed through theory and application. This process requires from the researcher the kinds of skills already mentioned in relation to using libraries. But it also requires a spirit of adventure (a willingness to explore new areas), an open attitude that avoids prejudging an idea and tenacity to invest the time and effort even when the going gets tough. What we are talking about here is resisting the temptation to make prior assumptions about any idea or theory until one is knowledgeable about that idea. This involves the spirit of research: looking for leads to other works cited by the author which have inuenced their thinking. Garnkel's ethnomethodology, for example, like many other new and interesting developments in all subject elds, did not emerge from nothing. It was a development from an existing set of theories and ideas. Garnkel systematically worked through a range of existing theories in order to see where some ideas would lead if applied. Through his reading and thinking he was able to explore, in the true spirit of adventure, the foundations and boundaries of social science. What enabled Garnkel to make a new contribution, even though the amount of work he has produced is relatively small, was his ability to see

Making a new contribution

The section on originality which follows

22

Doing a literature review

possibilities in existing ideas. Making new insights is not merely about being able to synthesize difcult and large amounts of materials, it also involves knowing how to be creative and, perhaps, original. It cannot be overemphasized, however, that to make a new contribution to knowledge you do not have to be a genius. The size of the contribution is not what matters, it is the quality of work that produces the insight. As you will see shortly, originality can be dened and is often systematic rather than ad hoc.

Demonstrating a high level of scholarship As we have noted earlier, the thesis is the only tangible evidence of the work and effort that has gone into the research. For this reason it needs to provide enough evidence, of the right type and in an appropriate form, to demonstrate that the desired level of scholarship has been achieved. A key part of the thesis which illustrates scholarship is the review of the literature. It is in this section that the balance and level of intellectual skills and abilities can be fully displayed for scrutiny and assessment. The review chapter might comprise only 30 to 40 pages, as in a doctoral thesis, or 15 to 20 in a master's thesis, although the actual length often depends on the nature of the research. Theory-based work tends to require a longer review than empirical work. Either way this is a very short space to cover all that is required and expected. Typically, the review chapter is an edited down version of the massive amount of notes taken from extensive reading. The material of all reviews consists of what has been searched, located, obtained and read, but is much more than separate items or a bibliography. The reader of the thesis is being asked to see this literature as representing the sum total of current knowledge on the topic. It must also demonstrate the ability to think critically in terms of evaluating ideas, methodologies and techniques to collect data, and reect on implications and possibilities for certain ideas. Scholarship therefore demands a wide range of skills and intellectual capabilities. If we take the methodological aspect of the thesis we can see that underpinning all research is the ability to demonstrate complete familiarity with the respective strengths and weaknesses of a range of research methodologies and techniques for collecting data. It is therefore important to read widely around the literature on the major intellectual traditions such as positivism and phenomenology. This is because it is these traditions that support and have shaped the ways in which we tend to view the nature of the world and how it is possible to go about developing knowledge and understanding of our world. Knowledge of historical ideas and theories, or philosophy and social theory, is essential. In a similar way to skills, knowledge of, say, Marx or the postmodernists might be seen as essential personal transferable knowledge. As a researcher you must also demonstrate the ability to assess methodologies used in the discipline or in the study of the topic in order to show clear and critical understanding of the limitations of the approach.

The literature review in research

23

This will show your ability to employ a range of theories and ideas common to the discipline and to subject them to critical evaluation in order to advance understanding. It involves demonstrating the capacity to argue rationally and present that argument in a coherent structure. So, you need to know how to analyse the arguments of others the reader of your thesis (an external examiner) will be looking to see how you have analysed such theories and how you have developed independent conclusions from your reading. In particular, your reader will be interested to see how you develop a case (argument) for the research you intend to undertake.

Demonstrating originality The notion of originality is very closely related to the function of the search and analysis of the literature. We have already indicated that through a rigorous analysis of a research literature one can give focus to a topic. It is through this focusing process that an original treatment of an established topic can be developed. Placing aside until later chapters how this has and can be done, we need to turn our attention here to the concept of originality. In Figure 1.3 (p. 24) we show some of the associations that can be made from the different denitions of originality. Use these to grasp the meaning of the term. This is important because in academic research the aim is not to replicate what has already been done, but to add in some way, no matter how small, something that helps further our understanding of the world in which we live. All research is in its own way unique. Even research that replicates work done by another person is unique. But it is not original. Being original might be taken to mean doing something no one has done before, or even thought about doing before. Sometimes this kind of approach to thinking about originality equates originality with special qualities assumed to be possessed by only a few individuals. The thing to remember is that originality is not a mysterious quality: it is something all researchers are capable of if they know how to think about, manage and play with ideas. There is an imaginary element to research. This is the ability to create and play with images in your mind or on paper, reawakening the child in the adult. This amounts to thinking using visual pictures, without any inhibitions or preconceived ideas and involves giving free rein to the imagination. Theorists such as Einstein attribute their ideas to being able to play with mental images and to make up imaginary experiments. This technique is used to make connections among things that you would not normally see as connectable. Einstein, for example, described how he came to think about the relativity of time and space in the way he did by saying it all began with an imaginary journey. Einstein was able to follow his fantasy through to produce his famous equation e = mc2. The point to note is that Einstein's journey was a small episode; something most of us are capable of experiencing. Einstein's achievement was in following through his ideas to their theoretical conclusions. He stopped short his work when he realized that his ideas could have a dark side: the

24

Doing a literature review

produced using your own faculties the result of thought without copy or imitation

originality authentic not been done

new in style, character, substance or form

Figure 1.3 Map of associations in denitions of originality

development of a nuclear weapon (it is reassuring to know that very few people will nd themselves in a similar situation to that of Einstein). It is sufcient to say that such episodes are an essential part of the research imagination. You will often nd yourself having such episodes as a part of the thinking process. You will often nd yourself understanding things that just a few days or weeks previously seemed difcult or incomprehensible because, as you apply more energy to your topic, you will increase your capacity for understanding. Therefore, notions and beliefs about having to be some kind of genius in order to be original can be placed to one side. Once this is done we might be able to see and learn how to be original in research. Phillips and Pugh (1994), in their study of doctoral research, identied nine denitions of what it means to be original. These are: 1 doing empirically based work that has not been done before; 2 using already known ideas, practices or approaches but with a new interpretation; 3 bringing new evidence to bear on an old issue or problem; 4 creating a new synthesis that has not been done before; 5 applying something done in another country to one's own country; 6 applying a technique usually associated with one area to another; 7 being cross-disciplinary by using different methodologies; 8 looking at areas that people in the discipline have not looked at before; 9 adding to knowledge in a way that has not previously been done before. The list presented by Phillips and Pugh is close to what might be expected from doctoral students, since it is oriented towards methodology and

The literature review in research

25

scholarship. It assumes the student already has an understanding of a subject knowledge.

CONCLUSION
There is no such thing as the perfect review. All reviews, irrespective of the topic, are written from a particular perspective or standpoint of the reviewer. This perspective often originates from the school of thought, vocation or ideological standpoint in which the reviewer is located. As a consequence, the particularity of the reviewer implies a particular reader. Reviewers usually write with a particular kind of reader in mind: a reader that they might want to inuence. It is factors such as these that make all reviews partial in some way or other. But this is not reason or excuse for a poor review, although they can make a review interesting, challenging or provocative. Partiality in terms of value judgements, opinions, moralizing and ideologues can often be found to have invaded or formed the starting point of a review. When reading a review written by someone else or undertaking a review, you should be aware of your own value judgements and try to avoid a lack of scholarly respect for the ideas of others. Producing a good review need not be too difcult. It can be far more rewarding than knocking something up quickly and without too much intellectual effort. A large degree of satisfaction can be had from working at the review over a period of time. For a master's or doctoral candidate this might be up to a year or more. A large measure of that satisfaction comes from the awareness that you have developed skills and acquired intellectual abilities you did not have before you began your research.

Shephard-01.qxd

10/4/2004

6:55 PM

Page 1

1
What Makes Some Presentations Good?
Key concepts in this chapter: Types of presentations. What do you think contributes to good presentation? What others think contributes to good presentation. Five categories to work with. Content, structure, self-presentation, interaction and presentation aids. Subject- and place-differences in expectations. Some people appear to break all of the rules. What contributes to bad presentations?

This chapter encourages us to think about what makes presentations good, and then follows this with an analysis of what many others have suggested. The chapter will also consider what tends to be viewed as bad presentation and what most often goes wrong in presentations. Three case studies and additional content will illustrate how the subject, venue and circumstances influence acceptable practice. A central aim of this chapter is to demystify the essential elements of what makes some presentations good, particularly to encourage those new to presenting. Most people agree on what makes presentations good and the characteristics of good presentations are not particularly surprising. Good content, understandable structure, interactions between presenter and audience, reasonable self-presentation and helpful use of presentation aids are all elements of good practice. There are subject differences in expectation, but new presenters should be able to research what is acceptable in their own discipline. It is rare for presenters to do everything right and there is a lot of scope for individual self-expression.

Shephard-01.qxd

10/4/2004

6:55 PM

Page 2

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

As we explore the characteristics of good presentations, we should also have in our minds the range of possible presentations. New presenters are often asked to undertake research project presentations as their first experience, either individually or in small groups. At a later stage in their career they may give a departmental seminar or a short conference presentation. Alternatively they may prepare for a poster presentation. Most researchers find themselves contributing to a research group presentation at some stage. More experienced professionals will be thinking about contributing to a panel presentation or to a symposium. Researchers at the peak of their career may be enticed to offer a keynote presentation to a large international conference. Although the range is significant all of these presentation-types have much in common.

Brainstorming the issues: what makes presentations good?


Lets start with your views on what makes a good presentation. I do not think that it would be possible, or desirable, to impose a fixed external model of a good presentation on to everyone. For one thing, it would not work as it ignores our own individual strengths and weaknesses that we really do need to address. For another, it would result, if successful, in very dull conferences and meetings! So, lets start with your views.

Think about the last really good presentation that you went to: a lecture, a conference presentation, a sales pitch or, as a last resort, a television presentation. Write down six things that you thought were good about the presentation.
Use a blank piece of paper for this, you need to keep the list for future reference. If possible, please do this task before you read on or look at the figures in this chapter.

What others say


Common responses
When I facilitate staff development workshops on Presenting your Research at Conferences I ask participants to do this task before attending the workshop.

Shephard-01.qxd

10/4/2004

6:55 PM

Page 3

WHAT MAKES SOME PRESENTATIONS GOOD?

I am confident that generally they do. I also use the task as an initial activity for pairs of participants in the early stages of the workshop. Participants at this stage are still apprehensive about the workshop. They do not know other participants and it is important that they rapidly feel at ease with them and with me. They need an activity that they feel comfortable with, that they can contribute to and will make them feel that they have something in common with other participants. If the task produced too many differences it would not work in this way. If each participant identified particular aspects of a presentation that were good for them but not for others, then this initial group activity would be more divisive than community-building. So here is the point: Generally people from academic backgrounds, from all subject areas that I have experience of, have common views on what makes presentations good. This applies to experienced academics and young postgraduate students alike. Of course there are individual differences and subject differences and I will describe these later. Naturally much of the detail emerges with differences of opinion, later; but generally people agree on a whole range of key issues. As I watch and listen to pairs or small groups of participants describe their experiences of good presentations I see and hear the surprise and relief that their views are commonly held views and that they do have things in common with other participants. It is at this stage that I see people relax into the workshop and start to really get involved. Within the workshop I usually record the views of individuals and small groups on a flip chart. I ask each pair to identify one aspect of a presentation that they think is good. Sometimes individuals within the pair modify the phrase used by the spokesperson, but generally members of the pair reach a consensus on the statement. We then briefly discuss the statement in the wider group and it is unusual for the statement to be radically different from that on the lists of all other pairs. Then I ask another group to identify another aspect, and so on. A typical flip chart, after this activity, looks something like Figure 1.1 (this is not a reproduction from any particular workshop but a synthesis from many). There follows a period of comparison, regrouping and consolidation. Some statements turn out to be quite similar to others and can be combined. Most importantly we try to group statements so as to reduce the number of variables that we will work on in the remaining workshop. Subdivision in this way is useful if it identifies clear elements of our own practice that we can work on to improve. It is clear from Figure 1.1 that some statements are about the presentation itself (e.g. The presentation had a logical structure), while others

Shephard-01.qxd

10/4/2004

6:55 PM

Page 4

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

FIGURE 1.1 A typical flip chart record of workshop participants views on what aspect of a presentation they think as good.
Uses good examples Its level was right for the audience Appropriate use of data To the point; not much waffle He handled the questions well Good timing The central ideas were summarized at the end I knew where the presentation was going She looked at the audience She knew her subject well Did not read from a script It seemed honest He asked the audience Fluent speaker some questions and got answers Good use of English She had charisma She looked relaxed The slides were clear and useful She was enthusiastic He did not just read his PowerPoint bullet points The presentation had a logical structure I could take useful notes

She had a professional appearance He engaged with the audience from the start She spoke to the audience

are more about the presenter (e.g. She looked at the audience). This is one useful subdivision. Perhaps you can look at your own list and decide how easy it would be to apply this subdivision. There are other fairly natural divisions. In relation to the presentation, it is useful to separate its structure from its content. Indeed, in Chapter 4, this is an important design feature that we will examine in depth. One other division is possibly less intuitive but I think that it provides a sound basis for further analysis and improvement. In relation to the presenter, rather than the presentation, I think that it is useful to divide aspects of how the presenter interacts with the audience from how the presenter presents her- or himself. This analysis gives us four major subdivisions: Structure, Content, Interaction and Self-Presentation. In my experience of many workshops, dividing the statements of what makes presentations good into these four categories proves to be relatively easy and occurs without controversy. There is one other category that is important to us, and groups differ in how they want to work with it. Many presentations, but certainly not all, make substantial use of audio-visual or presentation aids. Workshop participants have suggested almost universally that the way the presenter works with audio-visual
4

Shephard-01.qxd

10/4/2004

6:55 PM

Page 5

WHAT MAKES SOME PRESENTATIONS GOOD?

aids is a substantial factor in deciding whether a presentation is good or bad; but the precise details of what is good or bad practice in their use varies considerably. It is arguable that presentation aids and their use actually form part of the content of a presentation, influence and describe its structure, provide a mechanism for interacting with the audience and provide a platform for the presenters self-presentation. On this basis no separate category for the use of audio-visual aids is needed. I sometimes make this argument but invariably lose it. The workshop participants value the adoption of a separate category for presentation aids, so we shall maintain it here and discuss the issue further; both below and in Chapter 3. Figure 1.2 provides my attempt to categorize the statements provided in Figure 1.1. Can you undertake the same categorization of your statements? Do you have a list that includes completely different statements? Do you have views about what makes a good presentation that are similar to the views of others or are your views different?

Odd responses?
There will always be a variety of views. Academic staff at universities and colleges are perhaps a particularly diverse group, drawn together only by a common desire to research and teach, and often with very little else in common. There is no reason why everyone in this group should hold the same views on presentation style and every reason why there should be some individuals with different views. What surprises me most, however, is that it is very rare for individuals to have markedly different views at this stage. Participants may feel that they personally cannot achieve the standards expressed in the group activity and this may influence their expression of their views. I have also encountered individuals who feel unable to say what they think makes a good presentation, but who are perfectly able to express what makes a bad presentation. There is presumably some individual-difference psychology here that I am not experienced in interpreting (but I do have my own ideas about this!). Perhaps your statements, about what makes a presentation good, are particularly different from those provided in Figure 1.1. Is that a problem? Figure 1.1 represents the best combination of preferences that I can generate, based on the views of numerous workshop participants over several years. They are also fairly self-evident. Generally speaking, no one would expect a
5

Shephard-01.qxd

10/4/2004

6:55 PM

Page 6

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

FIGURE 1.2 The same record as in Figure 1.1 but here categorized as views on content, structure, self-presentation, interaction and presentation aids.
Content Structure

Uses good examples Its level was right for the audience Appropriate use of data To the point; not much waffle

The presentation had a logical structure

Good timing

I could take useful notes

The central ideas were summarized at the end

He handled the questions well

She looked at the audience

I knew where the presentation was going

She had a professional appearance

It seemed honest Fluent speaker Good use of English

She knew her subject well Did not read from a script He asked the audience some questions and got answers

He engaged with the audience from the start

She looked relaxed

She spoke to the audience

She was enthusiastic

She had charisma

The slides were clear and useful He did not just read his PowerPoint bullet points

Interaction

Self presentation

Presentation aids

presentation lacking in structure to be particularly good, or a presenter who uses no examples of relevance to the audience to be particularly engaging. Most of this is fairly down-to-earth. But then consider the experiences of most academics. They attend lectures, conferences and seminars, mostly in their own subject areas
6

Shephard-01.qxd

10/4/2004

6:55 PM

Page 7

WHAT MAKES SOME PRESENTATIONS GOOD?

and mostly given by people that they know. Does this fit with your own experience? Add to this the fact that, for most of us, many presentations that we attend are not that good. Very few would ever exemplify all of the positive attributes listed in Figure 1.1. If we are lucky, some of them would display some of these positive attributes. Figure 1.1 is therefore a wish-list, synthesized from the wishes of many, rather than an expectation. Your wish-list might be different from Figure 1.1, but that does not make it less desirable for you. There is another factor. Most of the key statements in Figure 1.1 are rather broad. She knew her subject well is not particularly precise. It represents an impression given by the presenter to the audience. It could have been achieved in a variety of ways; some of which may have been illusory, rather than real. The statement provides a broad aim but not enough detail to enable us to determine how this particular feat was done. Hidden in this breadth could have been any number of precise statements. Perhaps your statements do not match those in Figure 1.1 because you chose to address the issue at a different level. For now we must address the five key considerations described in Figure 1.2

Some conclusions: five key considerations


For much of the rest of this book we will work with five categories of statements about what makes presentations good. These are now described.

Content
This is the core of what is said in a presentation and in many respects the easiest thing for a presenter to change or adapt. Audiences tend to appreciate content that matches the presentations title and delivered at a pace and level to suit the audience.

Structure
Audiences acknowledge that structures can take many forms but generally they appreciate some indication of the major subdivisions of the presentation and other details such as how long it is likely to last and whether, or not, questions and answers form part of the presenters plans.
7

Shephard-01.qxd

10/4/2004

6:55 PM

Page 8

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

Self-presentation
Audiences like honest presenters, or at least presenters who appear to be honest. They also tend to like enthusiastic presenters but to dislike over-enthusiastic presenters. Most importantly they appreciate presenters who appear to know what they are talking about. Much of this depends on how audiences interpret what they see or hear.

Interaction
Most audiences want to feel as if the presenter has noticed that they are there. A presenter who talks to a whiteboard and fails to look at people in the audience is generally not appreciated.

Audio-visuals or presentation aids


For presentations that make use of audio-visual aids this often is the big one. Well-used aids can contribute positively to all four categories above. Badly-used aids generally just get labelled as badly-used.

Subject differences, place differences and humour


Consider the following statement:

Everyone knows that academics tend to display tribal characteristics. The tribes relate most strongly to the subject or discipline. Of course we should avoid stereotyping this diverse profession but aspects of clothing, speech and body language do often indicate whether the professor in front of us specializes in modern history, business studies or computer science.
Actually, in my experience, this is nonsense. Certainly if I had to guess someones academic subject from their clothing, speech or body language, I think I would fail dismally. People in academia are just too diverse for this. However, I would have a better chance if I saw their presentation style. Despite what is recorded above about academics (from all subjects) views on what makes presentations

Shephard-01.qxd

10/4/2004

6:55 PM

Page 9

WHAT MAKES SOME PRESENTATIONS GOOD?

good, when it actually comes to presenting there are subject differences. It is almost as if people have their own views on what makes the best presentations but that their discipline imposes constraints on how they actually present. One of the clearest expressions of this is how acceptable it is for a presenter to read from a script. Almost universally, academic staff tell me that the appearance of spontaneous speaking makes for a better presentation than reading from a script. But many academic staff from the humanities then go on to tell me that this is fine for others, but for them, in their subject setting, at their particular conferences, seminars and meetings, they will be expected to prepare a script, to have it in front of them and to stick to it. Some go so far as to say that they are expected to remember it verbatim, so that they can speak to the audience without appearing to read the script. These academic staff tell me that the words in their presentation have to be carefully crafted and that there is no room for improvisation. Given that a script is necessary, they either have to learn it word for word or they must read from the script. Most of these presenters probably do something in between. The consequences of this imperative, along with other subject-related design factors, will be addressed in Chapter 4. How people perceive the good and bad in presentations is also greatly dependent on factors that relate to place and related circumstances. Presentations can easily be viewed as too formal or too informal, depending on where they are given and the circumstances. Keynote presentations are expected to provide something different from the run of the mill presentations that follow. Presentations for a departmental seminar are generally longer and less formal than those for a major international conference. But how can you judge just what will be appropriate and what will not? The key here is really to anticipate what the audience expects, or will cope with, and this requires a degree of audience-research. Humour is perhaps the toughest of all attributes to identify as good or bad. Participants on my workshops give mixed messages here. Some like humour in a presentation and some do not. Figure 1.1 does not include a reference to humour; although many individuals include it on their lists of good aspects of presentations, many do not and have voiced opposition to its inclusion. I have not found a particular correlation to subject, gender or age here. Readers of this book will know that there are other sources of guidance for presenting at conferences, seminars and meetings (some of them are listed in the Bibliography). Many of these recommend the use of humour in presentations. Lenn Millbower, as one of many proponents, has written an article for Presenters

Shephard-01.qxd

10/4/2004

6:55 PM

Page 10

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

University, a website devoted to presentation skills. Lenns article, Laugh and learn, suggests that:
Laughter is an important component in any presentation. Even when (the) presenter ignores humor, the attendees find it, sometimes at the presenters expense. The need for laughter is so strong that participants seek out opportunities to laugh throughout every seminar. They do so with good reason. It is natural and appropriate to use humor in learning situations. It is, for a number of reasons, also demonstrative of solid instructional design. (Millbower, 2003)

The article is persuasive and I do agree with much of it, but I still have reservations. Perhaps the issue is primarily about audience expectations. Generally, I like to make people laugh in informal presentations, for example at departmental seminars, because at heart I do agree with much of what Lenn Millbower says. (Naturally, I want them to laugh with me, not at me.) But I tend not to attempt to make people laugh in formal presentations, for example, at major conferences. Partly this is because I am not brave enough; partly it is because I do not know the audience well enough to be sure about what will be seen as humorous and what will not; but mostly it is because participants at the important conferences that I go to do not expect me to be funny. They might expect other, better known, presenters to be funny, but not me. Perhaps I need to practise being humorous more. Perhaps we all do, so that humour becomes more universally acceptable at conferences. But in the meantime, humour remains a highly personal aspect of good or bad presentations. Many of these issues will re-emerge in later chapters.

Three case studies


These three case studies consider near extremes in presentation style and they are included to encourage readers to consider alternative views on what makes presentations good. Not all lecturers display characteristics of the absent-minded professor but some do and there does appear to be room in academic settings for amiable, avuncular but (apparently) poorly organized presenters. Cast your mind back to your last conference. Did you spot one? Maybe you even have one in your department? What are their characteristics? Perhaps a shoelace is frequently undone. Perhaps their hair is untidy, their tie has a stain on it or is even

10

Shephard-01.qxd

10/4/2004

6:55 PM

Page 11

WHAT MAKES SOME PRESENTATIONS GOOD?

tucked into their trousers? Perhaps they approach the podium with an armful of unruly papers, stumbling on the way. Do they have a wild look in their eyes? Do they carry an overflowing handbag? Do they have a spelling problem? But do they always have some interesting things to say, possibly said with humour? Do they give the sort of presentation that you remember? Are they, perhaps, actually, good presenters in an odd sort of way? Not that you would want to mimic them of course, but lets not be too dismissive of variety (Case studty 1.1). At the other extreme we should consider the outright professional. Is there still a place for the presenter who presents in the same way as they (allegedly) did in the Royal Institution in 1900? In my own subject areas I think that presentations like this are the exception rather than the rule, but this is not necessarily so in other subject areas. I know that colleagues in history and English departments often do still admire, perhaps even expect, this level of professionalism. In some respects we aim for the appearance of this professionalism in a range of other settings. Broadcasters, for example, often give extraordinarily good presentations (with the illusion of spontaneity and the precision of the prepared text) but they also benefit from the autocue, direction, rehearsals and retakes (Case study 1.2). The notion of careful use of presentation aids is then considered in our third case as we experience a struggling undergraduate student who produced a presentation that impressed her peers (Case Study 1.3).

CASE STUDY 1.1 The disorganized lecturer


Simon carried an armful of papers and overhead transparencies to the podium, thanking the Chairman on the way. He spent some time sorting out his aids and testing the overhead projector (OHP) before he looked at the audience and introduced his presentation. He had no notes in front of him and there was no indication that his presentation was remembered word for word; sentences seemed to be lacking in some aspects of grammar and there were quite a few ers and ums as Simon thought about how to express particular concepts. The introduction seemed to lack organization, some important aspects were given as if an afterthought, but I was

11

Shephard-01.qxd

10/4/2004

6:55 PM

Page 12

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

in no doubt about where the presentation was going and what I was expected to get out of it. He moved to the OHP quite quickly after his introduction and placed a transparency on it. He turned to check that the slide was in focus and that it could be seen, but did not appear to notice that it was not straight with respect to the screen. While Simon spoke about aspects of the figure he pointed sometimes to the figure on the OHP with his finger and sometimes to the projected image, again with his finger (either he had not noticed that a laser pointer had been provided or he had decided not to use it). He looked everywhere, and at everyone, but also took some time out to look at his figures as if he was trying to interpret them himself, there and then. As he did so he spoke his thoughts out aloud, debating the possible interpretations himself. He had quite a few figures to show us and certainly some were just flashed before our eyes while others were quite possibly lost in the pile. They were not really necessary, he explained. Simon ran out of time and the Chairman had to stand to indicate that it was time to move on. This seemed to prompt Simon, not to leave, but to summarize his presentation. This he did with clarity in perhaps 30 seconds. He left us with a list of questions that were to form the basis of his research, and perhaps that of others, until the next conference. This presentation probably does not conform to many of the statements of what makes presentations good, listed in Figure 1.1. In relation to our five key considerations, a critical friend might make the following observations: Content: of significant interest to the audience and at about the right level. Structure: appeared to lack structure but all of the intended outcomes described early on were achieved. Self-presentation: unorthodox but clearly enthusiastic and committed. Interaction: useful examples, good eye contact, engaging we were left with some interesting questions to think about. Presentation aids: unorthodox, tending to sloppy, but some visual aids were completely integrated into the presentation.

12

Shephard-01.qxd

10/4/2004

6:55 PM

Page 13

WHAT MAKES SOME PRESENTATIONS GOOD?

Was it a bad presentation? I think that, actually, it was a good presentation. I had seen Simon present before so I knew his style. I knew what Simon was going to talk about from the first few minutes. I felt engaged by the presentation and I enjoyed being part of his apparent exploration of the issues. I remember aspects of this presentation far more than any other in that conference. I would not try to mimic it because I know that I could not; nor would I advise others to use Simon as their role model. If you think that this is you then I do advise you to seek feedback from trusted colleagues. When the comedion Jimmy Tarbuck was interviewed on the radio programme Desert Island Discs, he said that veteron comedion Eric Morecambe had once given Jimmy some feedback on his highly individual style as a comic. Eric has apparently commented that Jimmy had something special and he recommended that Jimmy should never attempt to analyse it. I might offer the same advice to Simon, if I were asked, but I suspect that Simon already has confidence in his ability to present, his way.

CASE STUDY 1.2 The organized professional


James was the outgoing President of a learned society. His duty was to present the Presidential Address at the Societys annual conference in a neighbouring country. Preparing this presentation was his preoccupation for months before the conference. James had also been the Chairman of the academic department in which I was a postdoctoral research fellow. I saw the work that he put into his presentation. I saw the lights on late in the departments lecture theatre. I heard small samples of the presentation being practised and revised as I walked past his door. I was also giving a presentation at the conference but I must admit that I did nowhere near as much preparation. Near the end of the conference, everyone gathered in the largest lecture room available. The audience was hushed. James was introduced by the conference convener and he walked to the lectern. James was dressed impeccably. He

13

Shephard-01.qxd

10/4/2004

6:55 PM

Page 14

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

looked at ease and in command. He had a typed script in his hand but laid this firmly on the lectern. He knew exactly where the controls for the lights and audio-visual aids were and he used them faultlessly. Clearly he had practised in this room as well as in our lecture theatre at home. James spoke fluently to everyone in the room. I noticed that he looked at me for a time before he turned his attention to others in the room. His presentation was on his own research topic but was designed to be of interest to a wide range of listeners. There was something in it for just about everyone, including a number of well-chosen examples to enable lay-members of the audience to stay involved. He used well-chosen slides to illustrate points and avoided the use of technical terms. Where these were necessary he defined them and illustrated their use with the slides. James used a pointer to identify important areas of his slides. James had provided an introduction to his presentation so that we could follow its structure as it took place. The conference programme had also provided clear times for its start and finish. It started on time. It also finished on time; exactly. Clearly much of Jamess preparations had involved practice to ensure that the presentation finished on time. I think that it was memorized, word for word. The audience applauded. They knew that they had experienced a professional academics professional presentation. It might not have been the presentation that everyone would have given (there was, for example, no humour in it and I doubt that many in the audience would have been capable of such faultless timing) but I am confident that everyone in the room admired its professionalism. In my own subject areas I think that presentations like this are rare exceptions; but this is not so in other subject areas. In relation to our five key considerations, another critical friend might make the following observations:

14

Shephard-01.qxd

10/4/2004

6:55 PM

Page 15

WHAT MAKES SOME PRESENTATIONS GOOD?

Content: perfect for the occasion and delivered at a level appropriate to the full range of delegates. Structure: impeccable; easily understood by all in the audience. Self-presentation: totally professional. Interaction: good examples, good eye contact, possibly a little remote but perfect for the occasion. Presentation aids: high quality visual aids used to illustrate points very well.

CASE STUDY 1.3 The undergraduate project presentation


Clare was very worried about her project presentation and I was not surprised. She had not actually done as much work on her project as I had hoped. (She led a very active social life alongside her third-year undergraduate studies.) Nor was Clare a high-achiever in her academic assessments. She was, however, not shy. She could hold her own in any conversation or group activity. The presentation was not graded but it was compulsory. I think that Clare had anticipated her likely problems well. She thought that she would stand up in front of the group and forget what to say in exactly the same way that she forgets what to write in an examination. She admitted that she was not particularly interested in her project topic but didnt think that was the real problem. Even if she were asked to talk about her favourite pop group she would still find it difficult to maintain a structured presentation. The important points just didnt come to her mind at the right time. Clare knew that she could read from a script but also that she would be disappointed in herself if she resorted to this again. She wanted desperately to be congratulated by her peers for presenting well. I suggested that she tried to use PowerPoint to add structure, and key information, to her presentation. This was in the mid-1990s. PowerPoint was not widely used for undergraduate presentations at that time and we did struggle to get everything

15

Shephard-01.qxd

10/4/2004

6:55 PM

Page 16

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

set up. (The facility was then offered to other students, several of whom did adopt it.) Clare initially agreed to use PowerPoint to prepare overhead transparencies, but after working with the software for a day or so, felt prepared to use PowerPoint itself to deliver the slides. She was glad that she did. The presentation worked wonderfully for her. Clares natural confidence and ability to chat around any topic was exactly complemented by PowerPoints delivery of Clares crisp bullet points and structure. Clares peers were absolutely amazed, but I wasnt. With the right tools that girl could go far. An important point here is that of anticipation. Clare and I knew what her problems were likely to be and found a tool that enabled her to gain maximum credit for her strengths while having her weaknesses supported. In relation to our five key considerations, Clares critical friend might make the following observations: Content: of interest to the audience and at about the right level. We had heard about Clares project in part, but it was good to see it all come together here. Structure: clear structure, clearly presented using PowerPoint. Self-presentation: Clare looked so confident and really spoke well about her project. We had no idea that she had taken it so seriously. She seems to know lots and did make it clear when there were areas that she did not cover in her project. Interaction: Clare spoke to everyone in the audience as if we were her best friends. It was more like a chat about the topic than a formal presentation, but that got us all involved. Presentation aids: Clares slides were to the point. Perhaps some of the text was not necessary but it clearly helped Clare to keep on track.

What makes some presentations bad?


This question paraphrases one that will be considered in depth in Chapter 8. what tends to go wrong? It is also possible to reverse most of the statements in Figure 1.1 to list aspects of presentations that most people consider to be bad. The list in Table 1.1 attempts to combine these two concepts by relating what most often goes wrong to characteristics of presentations most often considered to be undesirable. With the exception of arriving late at the conference or

16

Shephard-01.qxd

TABLE 1.1 This table describes aspects of presentations that often fail and relates these to commonly reported views on what makes presentations good and bad.
WHAT MAKES PRESENTATIONS BAD
Level was wrong for the audience Audience cannot take notes Waffle Not knowing your subject well Reading from a script Lacking enthusiasm Lacking charisma Poor timing Lack of a logical structure Not providing a summary at the end Not identifying where the presentation is going at the start Not looking relaxed Not using good English or speaking fluently Inappropriate use of data Appearing to be dishonest Having an unprofessional appearance Not using good examples Not looking at the audience Not engaging with the audience from the start Not speaking to the audience Handling questions badly Asking the audience some questions but not getting answers Just reading your PowerPoint bullet points Slides neither clear nor useful

10/4/2004

WHAT TENDS TO GO WRONG

WHAT MAKES PRESENTATIONS GOOD

Lack of content

6:55 PM

Its level was right for the audience I could take useful notes To the point; not much waffle She knew her subject well

Boring content

Did not read from a script She was enthusiastic She had charisma

Page 17

Timing

Good timing

You lose the audience

The presentation had a logical structure The central ideas were summarized at the end I knew where the presentation was going

Nerves

She looked relaxed Good use of English Fluent speaker

Is it believable?

Appropriate use of data It seemed honest She had a professional appearance

Inability to interact with the audience

Uses good examples She looked at the audience He engaged with the audience from the start She spoke to the audience

Questions and answers

He handled the questions well He asked the audience some questions and got answers

Technology

He did not just read his PowerPoint bullet points The slides were clear and useful

Shephard-01.qxd

10/4/2004

6:55 PM

Page 18

PRESENTING AT CONFERENCES, SEMINARS AND MEETINGS

meeting, this table does illustrate most of the attributes of presentations that academic staff and postgraduate students most often express as poor.

Summary
I hope that sections in this chapter illustrate that there are few strict rules in presentation. In general, not everyone will agree with generalizations all of the time. But Table 1.1 does give a reasonable guide to what most of us need to do, most of the time, to deliver good presentations. As we shall see, some things are easier to achieve than others!

Reference
Millbower, L. (2003) Laugh and Learn. Presenters University http://www.presenters university.com/courses content laugh.php (accessed 7 January 2003).

Encouragement for new presenters


Those new to presenting should take heart from the contents of this chapter. Most people agree on what makes presentations good. The characteristics of good presentations are not surprising: good content, understandable structure, interactions between presenter and audience, reasonable self-presentation and helpful use of presentation aids. There are subject differences in expectation, but you should be able to research what is acceptable in your subject. It is rare for a presenter to do everything right and your own experience demonstrates that some presentations are very poor. You can do better than that. Much better. This book aims to help you overcome your weaknesses and build on your strengths.

18

3070-ch08.qxd

3/19/03

3:01 PM

Page 169

(Black plate)

Developing Memory Techniques


KEY CONCEPTS

LONG-TERM MEMORY SHORT-TERM MEMORY RECALL RECOGNITION RECONSTRUCTION RE-LEARNING SAVINGS REHEARSAL LEVELS OF PROCESSING MEMORY MODALITIES PRIMACY AND RECENCY EFFECTS SUBLIMINAL PROCESSES ORGANISATION MNEMONICS STATEDEPENDENT LEARNING EXTERNAL AND INTERNAL CONTEXT WORKING MEMORY MEMORY INTERFERENCE LEARNING AND RETRIEVAL

Imagine that you are asked to make a phone call on a night out and you are given a telephone number of 6 digits to remember. All you have to do is walk down the corridor to the nearest phone and so you decide not to write the number down. Instead you keep rehearsing the number as you walk and by the time you get to the phone you can easily remember it. You have used your short-term or working memory and it has served you very well. However, when you are asked to ring the same number a week later you appear to have forgotten the number. It does not seem to have been transferred to more permanent storage, at least in a manner that can be easily recalled. You may be able to remember the name of the person you had called, and you will remember having made the phone call, so the memory is not entirely dead. Or if someone had distracted you as you walked at the time of the first call or you had to wait in a queue for the phone you may have lost the number.

3070-ch08.qxd

3/19/03

3:01 PM

Page 169

(PANTONE Green CV plate)

8
KEY CONCEPTS

Your memory may be better than you think Short term and long term memory

3070-ch08.qxd

3/19/03

3:01 PM

Page 170

(Black plate)

170

/ STUDYING @ UNIVERSITY

It appears that some of the information evaporates at the short-term stage and other aspects are transferred to more abiding storage (long-term memory). Although this is an oversimplification of memory structure and function (Baddeley, 1999), it illustrates the point that some memories are readily retrieved and some appear to go AWOL. Many people underestimate the power of their own memory, perhaps partly because they chiefly access their short-term memory (now more commonly referred to as working memory), or have not used good memory techniques or have not sufficiently focused on the large volume of information that they do remember. The human brain has an enormous capacity for remembering, and some understanding of storage and retrieval procedures will help improve memory use.

There are some memories that we do not have to try to retrieve because they just spring into our conscious mind without solicitation. It is possible that these memories were important to us or we were especially interested in their content or that they were just catchy and humorous such as the following limerick: There once was a man from Trinity, who thought hed cracked the square root for infinity; but there were so many digits, it gave him the fidgets, so he dropped it and studied divinity. Many years ago a speaker used this to illustrate the point that perhaps some people study theology because they would fail at everything else! Because there is rhyme, humour, a moral and a context, this limerick is remembered effortlessly. You should be encouraged to know that some of the things you study at college/university will stay with you and you will be able to recall them for use whenever you need them.

Some years ago a mature first year student went to his tutor for advice about whether he should sit his exams. He had attended many of the lectures and had completed all assignments but had been robbed of revision time because of family pressures and work commitments. The tutors advice was that he should sit the exams because he would remember much from the lectures, seminars and

3070-ch08.qxd

3/19/03

3:01 PM

Page 170

(PANTONE Green CV plate)

DIRECT RECALL WITHOUT CUES

Capitalising on memory

3070-ch08.qxd

3/19/03

3:01 PM

Page 171

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

171

assignments and all that was required in first year was a pass (40 per cent). However the student did not have enough confidence in his memory to believe that with limited revision he could recall sufficient material to attain a pass. The result was great inconvenience to himself when he really had in effect little to lose and much to gain for trying.

Think back to the illustration about the phone number that you were given to remember. One week after the event you had forgotten the number completely. However, suppose someone handed you the number on a piece of paper and asked you if you could remember it. Perhaps you would instantly identify it as the number you had rung a week previously, even though you could not recall it spontaneously. It could be said that it rang a bell in your memory! For example, you have probably had the experience where you recognise someones face although you cannot remember where and when you saw him or her before. Also you may be quite sure that you have heard a certain tune at least once before but you do not know its name, composer or performer. Indeed you may even recall a particular smell merely from one previous encounter, especially if it is distinctive. In short, your recognition memory is probably much better than you think, and it is particularly useful to you in multiple choice tests where you are asked to identify the one right answer in the midst of wrong answers.

Ebbinghaus (1885) and his assistant tried to learn lists of nonsense syllables by rote in order to ascertain if they could recall these spontaneously without any cues to assist memory. Some lists they were able to recall freely but others apparently could not be retrieved. However, whenever they saw again the lists they could not freely recall, they were able to recognise if the order had been changed, and were able to reconstruct the lists into the correct order. In addition, when they went to learn again the lists they had appeared to forget, they found these much easier to learn second time round. This implied that they possessed re-learning savings. Therefore the four forms of memory that are identified from the experiments of Ebbinghaus are recall, recognition, reconstruction and re-learning savings.

3070-ch08.qxd

3/19/03

3:01 PM

Page 171

(PANTONE Green CV plate)

MEMORY BY RECOGNITION

MEMORY BY RECONSTRUCTION

3070-ch08.qxd

3/19/03

3:01 PM

Page 172

(Black plate)

172

/ STUDYING @ UNIVERSITY

A variation of reconstruction might be very useful for you in preparing for exams. You may find it helpful to take the outline of a lecture with headings and subheadings and reconstruct it into a different form. You could then try to reverse this procedure by rearranging the material back into its original form. Indeed some of the material you are given might need to be reconstructed properly into a good structured form for the first time! The general point is that making some changes to the material and re-arranging it into different forms may help learning, retention and recall.

One of the best ways to remember a good joke is to tell it to someone else as soon as possible after you have heard it. People often remark that they wish they had a better memory for jokes getting into the habit of passing them on immediately is one good strategy to start with. In this way the memory is transferred to longer-term storage and you will hopefully be reinforced by the laughter of those you tell the joke to!

You can also write out a few of the pointers from each lecture as pegs on which to hang the subject matter from the lecture. Each of these becomes like a key for a little box of information that you can open and unpack. From time to time you can take a cursory glance at your summary outlines to keep the overall vision of the module before you.

There was a popular TV game show in which contestants were given limited time to watch valuable objects pass before them on a conveyor. Subsequently, with the objects out of sight, they were asked to recall as many

3070-ch08.qxd

3/19/03

3:01 PM

Page 172

(PANTONE Green CV plate)

MEMORY BY REHEARSAL

A similar approach will also help you imbibe and digest your academic material. For example, during coffee or lunch break students can attempt to recall the major points they have learned from an important lecture. It would be most useful to set aside regular times to do this as it would give a lot of mutual stimulation and would not be too much of a imposition on your time.

PRIMACY AND RECENCY EFFECTS

TV game show

3070-ch08.qxd

3/19/03

3:01 PM

Page 173

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

173

objects as they could remember within about 20 or 30 seconds. The contestants got to keep all the objects they could remember, so there was a big incentive to learn and recall. According to the theory of primacy and recency effects, contestants would be more likely to remember the last objects (recency) and the first objects (primacy), and would be more likely to forget the objects in the middle. It would be interesting to test if that is what actually happened in the game show, but these effects have been demonstrated in experiments.

Primacy and recency effects are most likely to occur where there are a series of things to be learned within a short time frame with limited time for recall. Their effects are likely to kick in if you leave your revision until the night before your test. However, if you pace yourself out well, you can learn the material in the middle (as well as at both ends) by going back over it and giving due attention to it. Psychologists have explained primacy and recency effects by displacement and distraction coupled with attention and rehearsal. These effects lead to the apparent loss of some material but this can be avoided if time pressure is removed. Planning your revision and allowing sufficient time for it will facilitate good memory processing and counteract primacy and recency effects.

Subliminal activity refers to the process where the mind takes in information without having given conscious attention to it. Psychologists use an instrument called a tachistoscope to demonstrate the reality of subliminal processes in memory function. Advertisers have also attempted to capitalise on this facility within the human psyche in order to sell their products. Pleasant music in a shopping environment may help to lift the mood of shoppers, even though they may be not be consciously listening to the music. An associated idea is referred to as the cocktail party phenomenon. An example of this is where you are in a crowded room at a party and engrossed in conversation with one or two friends. Although you are not paying any attention to the many conversations around you, if someone happens to mention your name it is possible that you will pick up on this and turn toward them. Even as you read this book you are monitoring sights, sounds, smells, temperature etc. around you, and although you are not diverting attention toward them, you are likely to pick up on changes around you. This understanding is very encouraging for the process of learning, for much more than we are consciously aware of goes on in the academic environment related to reading, listening and interacting with others, and we should not underestimate our capacity to learn.

3070-ch08.qxd

3/19/03

3:01 PM

Page 173

(PANTONE Green CV plate)

THE ROLE OF SUBLIMINAL PROCESSES

3070-ch08.qxd

3/19/03

3:01 PM

Page 174

(Black plate)

174

/ STUDYING @ UNIVERSITY

If a library is well organised and the books are kept in place, then the task of finding the book you require is much easier. For example, if each subject has a designated area in the library, you know that you should be able to find your book in that vicinity, and if authors names are arranged alphabetically, that will further simplify the task. Moreover, if the book has a code number that you can look up on a computer, then it should not be difficult to trace its whereabouts. In short, the more efficient the organisation and coding in the library, the easier and quicker it is to pinpoint the book you want. Libraries that are well organised and kept tidy are the best public servants. Similarly, there is much you can do to organise your memory and keep it as tidy as possible. If you store the information in an organised manner, you will have the cues at your disposal to recall the information you want when you need it.

Get a friend to read over Form A below and another friend to read over Form B. Give both the same time limit to memorise the list. Give the friend who memorised Form A, a blank sheet of paper to write down as many words as they can remember, but give the friend who did Form B a sheet of paper with the four headings below, and see who can recall the most words. The experiment will only work if the time limits (for example, 30 to 45 seconds) are strict and the two people are around the same age group and have the same educational background. It should illustrate the point that good organisational strategies assist memory recall. Some participants may, however, organise the material themselves without any prompting, and if they do then this should be taken into account in the attempt to understand the results. If you have the time and opportunity, you may want to run the experiment on two groups of students from the same class.

3070-ch08.qxd

3/19/03

3:01 PM

Page 174

(PANTONE Green CV plate)

Organisational aspects of learning


THE LIBRARY

Illustration

Exercise Organised memory

3070-ch08.qxd

3/19/03

3:01 PM

Page 175

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

175

Form A. Potato, Pen, Car, Bus, Shoes, Shirt, Paper, Book, Cabbage, Train, Dress, Eraser, Mayonnaise, Lorry, Hat, Stapler, Pizza, Bicycle, Denims, Lasagne Form B Potato Cabbage Mayonnaise Lasagne Pizza Headings: Pen Paper Book Eraser Stapler Food Car Bus Train Lorry Bicycle Stationery Shoes Shirt Dress Hat Denims Transport Clothes

Look carefully at what you have to learn and then think of how you can make it manageable and workable. Take the typical news programme on TV or radio as an example to illustrate the point. At the beginning of the programme you are given the news headlines and these may be comprised of five or six items of news in capsule form. As the programme unfolds, all the basic headlines are elaborated on and all the necessary details are filled in. What the newsreader does at the beginning is to summarise and map out the shape and direction of the programme. You will have a very clear impression in your mind of what is about to follow. Moreover, at the end there will again be a summary of the main bullet points, so that the viewers/listeners will be left with a clear impression of all the news events. Producers of the programme also have to make a number of important decisions, for example, which items are most newsworthy and should be included in the programme? How long should each item be given? In what order should the items appear? If you are taking a written test or preparing a written assignment for college/university you will also have to make decisions about these kinds of questions. In terms of using your memory well, you need to think about the main points (your news headlines) and the order in which it is best to remember them.

3070-ch08.qxd

3/19/03

3:01 PM

Page 175

(PANTONE Green CV plate)

ARRANGING MATERIAL IN STRUCTURED POINTS

3070-ch08.qxd

3/19/03

3:01 PM

Page 176

(Black plate)

176

/ STUDYING @ UNIVERSITY

The word mnemonics refers to aids that are used to assist memory recall. It is a most interesting word in that it is derived from the Greek word for tomb the place that is visited to recall memories of a loved one or friend. Many people find comfort in this practice because it brings back powerful and pleasant memories. In the same way it is wise to use the full range of materials that will produce cues for recalling the subject matter you aim to learn. One of these techniques is the use of alliteration, where a series of words are used that all begin with the same letter, for example, alliterations artful aid. In this case the retrieval cue is the common letter at the beginning of each word. Previously we made reference to the four strategies used by Ebbinghaus and described these by the use of alliteration: recall, recognition, reconstruction and re-learning savings. Another useful mnemonic is where the first letter from a series of words is taken and one word is made from these. For example, in the personality theory known as The Big Five, a word is derived from the five key words in the theory. The five words are Extraversion, Conscientiousness, Openness, Agreeableness and Neuroticism. The first letter from each of these can be taken and rearranged into the acronym, OCEAN. An example that is often used in brainstorming sessions is SWOT the four key words here are Strengths, Weaknesses, Opportunities and Threats. You can also devise your own acronyms or use nonsense words as mnemonics! These methods are especially useful for remembering a series of key words that are linked in some way. Other strategies include rhyme finding words or ideas that fit into a memorable rhyme, or chime words with similar endings such as clarity, brevity, certainty or perception, sensation and reaction.

An old adage says that a picture is worth a thousand words. What you might be able to do with material you find difficult to learn is to turn it into a picture, or series of small pictures, no matter how bizarre these may seem. It is reported that James Joyce could remember the names and types of shops up and down a range of streets in Dublin from his earliest days. Although he had a phenomenal memory, it was no doubt facilitated by visualisation techniques in this instance. In learning another language, vocabulary can be built by use of visualisation techniques. One frequently

3070-ch08.qxd

3/19/03

3:01 PM

Page 176

(PANTONE Green CV plate)

SOME USEFUL MNEMONICS

VISUALISATION STRATEGIES

3070-ch08.qxd

3/19/03

3:01 PM

Page 177

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

177

cited example is that the Spanish word for tent, carp, can be remembered by visualising a fish (carp) in a tent. The Hebrew word for the earth is ha-arats and this can be remembered by the carrots (sounds alike) the earth is the place where you get the carrots! This example combines both audio and visual images!

Try to arrange the words below (from a previous exercise) into a simple story where you go shopping for food just before lunch time, and then for clothes and stationery in the afternoon. Potato, Pen, Car, Bus, Shoes, Shirt, Paper, Book, Cabbage, Train, Dress, Eraser, Mayonnaise, Lorry, Hat, Stapler, Pizza, Bicycle, Denims, Lasagne. Does this help you to remember the words?

Many students demonstrate their powers of memory at a student formal when they get up to dance to the popular songs. They seem to know almost all the songs and all the words and also remember the appropriate movements to accompany each song. These are frequently the same students who complain about poor memory in relation to their academic work! When students listen to their favourite pop albums they even remember the precise sequence in which the songs appear! Even if they cannot document these on paper by direct recall, as they come to the end of listening to each track this acts as a memory cue for the next track.

That is precisely what can happen in an exam/test setting when the autonomic nervous system has been triggered and the adrenaline is pumping. Once you start

3070-ch08.qxd

3/19/03

3:01 PM

Page 177

(PANTONE Green CV plate)

Exercise Memory by visualisation

TRIGGERS AND CHAIN REACTIONS

Students song and dance routines

3070-ch08.qxd

3/19/03

3:01 PM

Page 178

(Black plate)

178

/ STUDYING @ UNIVERSITY

writing your essay plans you are likely to find that the ideas will begin to flow. One idea triggers another in a chain reaction until you finish. Sometimes you appear to dry up for a while, but the flow will come back again. However, when you are in full throttle like this, watch that you dont go off at a tangent. Keep a regular check on your planning, pacing and timing.

A psychology teacher explained memory storage and retrieval by the analogy of little people in the brain. This idea had no doubt been derived from a regular feature, called The Numbskulls, in a popular childrens comic. The idea the teacher used was that the little people ran and searched all over our brain for the stored memories that we requested. If we used a good system of storage, we made the task easier for them, but if not then their task was harder and took longer (like the library illustration). However, the encouraging fact in the illustration is that once we commission these people in our brain, they do not stop working until they find the required object! This relates back to the point on subliminal processes. You will no doubt have had the experience where you have tried so intensely to remember some item of information and it would not come. However, later when you were not thinking about it, it suddenly came forcibly into your mind. It is just as if the imaginary little people beavered away at it until they found it, even though you had forgotten that you had sent a request.

Students should not therefore be discouraged in their reading even if they cannot initially regurgitate what they have read. There is every likelihood that during an exam or assignment, some important fact will flash into the mind as if from nowhere.

Research suggests that memories may be more readily retained and retrieved if they are processed at various levels and not merely by rote learning. An important aspect

3070-ch08.qxd

3/19/03

3:01 PM

Page 178

(PANTONE Green CV plate)

The little people in the brain

Levels of processing in memory


UNDERSTANDING FACILITATES GOOD MEMORY

3070-ch08.qxd

3/19/03

3:01 PM

Page 179

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

179

of this is the element of understanding, and especially if the understanding is followed by related practise.

Part of learning the history of a war, is not only to know how many nations were involved, but also when and why each entered the fray. Learning the proper sequence of events is more easily facilitated if you understand why each nation entered the battle.

If you learn to enjoy what you are doing you are more likely to remember it. Previous reference was made to the students who remember their favourite songs and the range of dance movements that accompany these. Ardent followers of sport often know large volumes of facts and figures about their favourite teams. Because they enjoy the games and are always keen to know how their team is faring, there is no resistance to mastering all the relevant details. Moreover, students can eventually come to enjoy subjects that did not have much initial appeal to them. The key is to be patient, to give yourself time and to work steadily.

Closely allied to enjoyment is motivation, and an illustration of the role of motivation in memory is the TV game previously referred to where contestants can win every valuable item they can remember seeing. In this case the motivation to learn is likely to be high and therefore the effort and application to learn is also likely to be high. It will help the learning process if you remind yourself of the range of prizes that stand at the end of your course. For example, there is the satisfaction of completing the task, the expertise that will have been acquired, the congratulations you will receive from family and friends and the awareness of their sense of pride in your achievement, the passport to the career of your choice etc. Use whatever you can to get yourself motivated, and try to use both long-term and short-term reinforcers for learning.

3070-ch08.qxd

3/19/03

3:01 PM

Page 179

(PANTONE Green CV plate)

For example, you can learn off the ingredients required for a recipe and can then understand how to add each ingredient to the mixture in a particular sequence in order for the consistency to be exactly right. However, learning is really complete when you successfully mix the ingredients together in practise. If you make a mistake in the sequencing at the first attempt then your understanding of why the ingredients should be mixed in a particular sequence is likely to be strengthened.

ENJOYMENT HELPS ACCESSING MEMORIES

MOTIVATION BRINGS MEMORIES INTO CLEAR FOCUS

3070-ch08.qxd

3/19/03

3:01 PM

Page 180

(Black plate)

180

/ STUDYING @ UNIVERSITY

Human memory is functional in all five senses sight, hearing, smell, touch and taste. Many things are remembered by several of these modalities, for example, a Madras curry by its colour, smell and taste. As previously asserted, memories can be strengthened by rehearsal, because re-learning savings are operative.

Making use of as many modalities as are available to you is likely to be an advantage in learning. It is not good to become conditioned into thinking that you can only learn at set times and in particular contexts. In the busyness of life and with many deadlines looming it is wise to adapt learning to various contexts.

It is always a profitable exercise to reproduce learned material in your own words. When it comes to writing examination essays and assignments, what your educators want to see is the fruits of your own work. Of course they want to see evidence carefully and faithfully presented, but they also want to see your interpretative comments. Moreover, it is vital to acquire through practise, the ability to condense and summarise the main points from cited research because the time and space is not available to give an exhaustive account. You will be expected to cite the main findings from key studies, so it is import to learn to reproduce the material in skeleton form.

Think about your own choice of career and how the sequence of events in your life has led you to be where you are at present. This should include qualifications you have acquired and opportunities that have opened or closed for you. Also included would be important life events that may have changed your intended career direction. Make a brief rsum of all the important things that may have moulded the direction you have taken to date. Ensure that each is in proper sequence. Why do you think these things are likely to be clear in your memory?

3070-ch08.qxd

3/19/03

3:01 PM

Page 180

(PANTONE Green CV plate)

MEMORY IN A VARIETY OF MODALITIES

Memories can be strengthened by the use of various senses and by learning in a variety of contexts. You can learn the same material by reading and listening and then by reproducing this in writing and interacting with others. You can learn effectively in a library, a lecture room, in your private study room, on a bus, a park bench etc.

PROCESSING BY REPRODUCING MATERIAL

Exercise Reviewing your career experience

3070-ch08.qxd

3/19/03

3:01 PM

Page 181

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

181

People frequently try to remember things that are important to them such as the birthday of someone near and dear. Important dates are noted in a diary so that they will not be easily forgotten. Also some important event may be recorded at the top of a diary the week before it is scheduled to take place. Students at university are seen near examination times taking notes of the dates and venues for their exams. Some then record these details in several sources so that this important information is not mislaid. Job applicants are very keen to know what the pay scales are for the vacant post and where they are likely to be placed on this scale if their application is successful. Most people will have good clear memories about these matters because they have personal implications for them. It is always encouraging to see students who have that extra enthusiasm for learning. After a lecture they come to the front to follow up on some reference, or fill in some detail they have missed or get further clarification of some point. They may be more likely to remember what they really need to know because they are focused and learning is important to them.

It is clear that well-structured systems help improve learning, retention and retrieval. These give you immediate access to the cues that you need to retrieve what you are looking for. For example, if you cannot remember someones name, it might help to work through each letter of the alphabet. When the correct letter is arrived at, the name of the person may spring suddenly to mind or you may at least remember that the persons name begins with that letter. This may indicate that memories are stored or more easily retrieved under categories. Therefore there is some value in grouping information together in sensible clusters in order to facilitate recall. Furthermore, it may also be helpful to connect clusters of information in a kind of network, such as in a flow chart or in a hierarchical structure. For example, you could write the continents of the world as the top category on your page, and a country from each, followed by the capital city from each country and then another city from each country. If you draw out a structure for the material you intend to learn, this will not only assist the memory process but will also equip you with good strategies for examinations and assignments. For an example of a structure that is drawn out in diagram form, see the plan for an essay on depression at the end of Chapter 5.

3070-ch08.qxd

3/19/03

3:01 PM

Page 181

(PANTONE Green CV plate)

REMEMBERING THINGS IMPORTANT TO YOU

DRAWING OUT THE FACTS

3070-ch08.qxd

3/19/03

3:01 PM

Page 182

(Black plate)

182

/ STUDYING @ UNIVERSITY

In recent years there have been a variety of food scares that threaten to upset the balance between health and disease. Many people are passionately concerned about this because the well-being of humans and animals is involved. If the procedures involved in the production of food did not have such serious implications then not too many people would be interested in the results of the various pieces of research that have been commissioned. However, because of the implications for human health and animal welfare, the studies are given high profile media coverage and are the subject of endless controversial debates, with passions often running very high. The lesson learnt from this is that people are more likely to attend to and remember the facts associated with matters that have important implications related to real issues. Therefore your learning will be more effective if you can relate the subject to live issues. With an essay or assignment you should always try to make an interesting story and if possible demonstrate the implications and applications from your study.

What are the arguments for and against the corporal punishment of children? What are the likely implications of laws that are passed either for or against it? This is an example of how people may be able to remember the arguments because they have strong views one way or the other.

If you take a trip back to your old school you are likely to find a chain of memories are triggered as you visit the various places where significant events occurred, for example, the gymnasium, the assembly hall, various classrooms, the restaurant, the common room etc. There may be memories triggered that you would not ordinarily recall spontaneously. A similar experience may occur as you rummage through the roof space in your house and discover some old toy that you cherished as a child a whole host of childhood memories may flood into your consciousness. The same may

3070-ch08.qxd

3/19/03

3:01 PM

Page 182

(PANTONE Green CV plate)

GIVING LIFE TO ABSTRACT IDEAS

Exercise Corporal punishment debate

Contextual factors in memory


EXTERNAL CONTEXT

3070-ch08.qxd

3/19/03

3:01 PM

Page 183

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

183

happen as you look over an old photograph or return to a place where you enjoyed a holiday many years previously. The memories had been well and truly stored but you suddenly found a key to unlock their treasures. This phenomenon is referred to as context dependent memory and is associated with context dependent learning. That does not necessarily imply that the memories will only come back if you return to the context in which they occurred. However, it might imply that there is some value in learning your material in a variety of contexts so that the material can have a variety of cues to aid recall. Moreover, you may find it useful to project your mind into the context where you learned the material, such as in a lecture room.

Context dependent learning relates not only to the external setting but also to the internal state, that is, state dependent learning. For example it is suggested that you may not remember events that occur when you were drunk but they will come back when you get drunk again! Or when you are in a depressed or happy mood the memories associated with these may be more likely to return when you in the same mood again. Probably the best advice is not to rely too heavily on state dependent learning, especially if you cannot recreate the same state in an exam room. A calm and steady mood is more likely to be useful to you for revision and examinations. However, some students do prefer to study with the buzz of other people around them or with the sound of some music in the background. Whatever works best for you as an individual is OK, but there would be value in allowing yourself some practise at an exam room type situation so that you do not feel like an alien when you enter it for the real test! You might want to consider setting yourself some exam questions from previous exam papers and attempt them under test conditions.

Some educators have argued that students should get conditioned to studying at one desk in one room and should ensure that they do nothing else but study at that desk. That advice was doubtless designed to help students adjust to regular and disciplined patterns of study. If that approach is working well for an individual student then there is no need to give it up, although it could also be extended to other contexts. Many students are, however, compelled to share houses and study places and are forced to compete with many distractions. Necessity dictates that they learn in

3070-ch08.qxd

3/19/03

3:01 PM

Page 183

(PANTONE Green CV plate)

INTERNAL CONTEXT

LEARNING IN VARIOUS CONTEXTS

3070-ch08.qxd

3/19/03

3:01 PM

Page 184

(Black plate)

184

/ STUDYING @ UNIVERSITY

a variety of contexts and this is no bad thing, provided there is rhythm and regularity in their practise.

Bems self-perception theory suggests that individuals tend to adopt certain beliefs about themselves and then feel obliged to act these out in practise: they need congruence between their beliefs and practice (see Aronson, Wilson & Akert, 1994). To do anything contrary to the image they have of themselves would cause them to experience a state of dissonance. In relation to study habits, students describe themselves variously as a morning person, an evening person, an afternoon person or a late night person. The downside of being confined in this way is that if an exam falls outside the time when the student feels they are at their best, they start with a psychological disadvantage. For these students there is likely to be some advantage in changing their self-perception and then acting out the dynamics of their new extended self-image. A student can still, of course, have preference for a particular time of day without restricting their potential by imposing inhibiting limits on themselves.

Sometimes learning and retrieval may be inhibited because the material is being interfered with by other facts previously learned. It may be the case that either the old material is interfering with the new or vice-versa. One may dislodge and displace or confound the other. The result may be that either the information cannot be recalled at all, or else the wrong information is recalled. This is likely to happen when two words have similar meanings with subtle differences or when two words sound similar but have different meanings.

If a student has difficulty with recall they should write down the two words or ideas and try to devise some useful mnemonic to distinguish between them.

3070-ch08.qxd

3/19/03

3:01 PM

Page 184

(PANTONE Green CV plate)

LEARNING AT DIFFERENT TIMES

Memory problems in learning and retrieval


INTERFERENCE IN LEARNING AND RETRIEVAL

In a popular TV quiz show, the contestants are asked to choose the right answer from a series of four given answers. Two of these are usually the more likely answer, but it is said by the host of the show that all the right answers are easy if you know them! However the introduction of an answer similar to the right one tests the human weakness of interference and confusion in recall.

3070-ch08.qxd

3/19/03

3:01 PM

Page 185

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

185

Continual means at regular intervals but continuous means without interruption. The difference can be remembered by thinking of a dripping tap (continual) and a flowing tap (continuous). Furthermore, the difference can be remembered by thinking of the ous at the end of continuous as a sound like flowing water.

Earlier it was asserted that we may be taking in information even when we are not consciously aware of it (by subliminal processes). Although it is useful and encouraging to know about these processes, it would not be wise to bank on them as a primary source of learning. Good memory process is facilitated by attention, interest, motivation and by the use of good mnemonics. That does not imply that memory should be relentlessly bombarded with heavy and complex material continually. Therefore, it is important that revision strategy should be planned rather than trying to make a desperate attempt to swamp the memory on the eve of a test or examination. Such a maladaptive practice is likely to lead to interference, confusion and inefficiency. In the course of the academic semester, students can highlight the points they will later build on for revision. As the time for revision approaches, the material that was previously selected can be worked through systematically with revision sessions being spaced out to allow for adequate breaks so that the memory will function optimally.

Larger marathons extend over a distance of 26 miles, and no runner, no matter how enthusiastic, would attempt this distance in their first practise run. Runners set themselves much shorter targets initially and then increase these in gradual

3070-ch08.qxd

3/19/03

3:01 PM

Page 185

(PANTONE Green CV plate)

Example Mnemonics for subtle differences

SATURATION THE DANGER OF RELENTLESS BOMBARDMENT

THE NEED FOR GRADUAL MEMORY DEVELOPMENT

Work outs and marathons

3070-ch08.qxd

3/19/03

3:01 PM

Page 186

(Black plate)

186

/ STUDYING @ UNIVERSITY

increments. If their ultimate goal is to complete a 26-mile marathon, they may aim at a mini-marathon of 46 miles in the shorter term. Through regular practise and exercise the muscles gain strength, breathing improves and determination and discipline are strengthened. In the same way memory will improve by practise if the student is patient and determined enough. It should also be borne in mind that because re-learning savings are possessed from previous memory deposits, consolidation is easier than first time learning. If students feel that they are suffering from memory cramp they should allow themselves time for a break to replenish their memory muscles. While students are learning to develop the use of memory they should give themselves realistic work outs and warm ups.

Some people are characteristically over-confident in their own memory. Most of us have at times been certain in recalling the facts only to discover later that we had dressed them up considerably. Sadly, innocent people have been wrongly convicted of crimes because someone claimed to remember distinctly seeing their face at a crime scene and the accused had no alibi to vindicate their claim of innocence. The damage has often been done before the full truth can be ascertained. Therefore, although our memories are good and can be developed, it is important to remember that they are far from infallible and are prone to distort facts. Excess confidence in our memory will lead us astray but the memory techniques and strategies advocated here will serve as safeguards. Students can work at improving the use of memory by selection, attention, enjoyment, processing, visualisation, structure, consolidation and whatever else may prove effective. Realistic confidence in our memory is desirable, but reckless presumption should be avoided.

3070-ch08.qxd

3/19/03

3:01 PM

Page 186

(PANTONE Green CV plate)

FAILURE TO EMPLOY SUITABLE STRATEGIES

3070-ch08.qxd

3/19/03

3:01 PM

Page 187

(Black plate)

D E V E L O P I N G M E M O RY T E C H N I Q U E S /

187

SUMMARY Memory appears to have long-term and short-term aspects, and many people have better memories than they suppose. Four functional elements in memory are recall, recognition, reconstruction and re-learning savings. Primacy and recency effects can be countered by consolidation, and recall is facilitated by organised learning techniques. Some useful mnemonics include alliteration, acronyms, rhymes, chimes, visualisation, pegs and memory cues. Memory function is likely to be strengthened by understanding and enjoying the learning material, and by processing at various levels and in diverse modalities. Memory process is enhanced where there is good motivation to learn, where the facts are important to an individual and where the learner has the opportunity to reproduce the material as soon as possible after initial learning. It may be advantageous to memory to learn and revise material at different times and in different contexts. Learning can be thwarted by continual bombardment and saturation. Memory recall can be confounded by interference. Memory efficiency can be improved by the use of sensible strategies that include breaks from learning.

3070-ch08.qxd

3/19/03

3:01 PM

Page 187

(PANTONE Green CV plate)

SUMMARY

Chapter-03.qxd

11/12/2004

2:51 PM

Page 55

Finding and formulating your topic

CHAPTER CONCEPTS
MISCONCEPTIONS ABOUT TOPICS WHAT IS A TOPIC? TOPICS AS PUZZLES PUZZLES AS RIDDLES TO BE UNRIDDLED BASIC ADVICE ON RESEARCH TOPICS THE EARLIER THE BETTER GO FROM THE GENERAL TO THE PARTICULAR AVOID POLITICIZED TOPICS BE CAREFUL WITH PERSONAL ISSUES FIND THE LINE OF LEAST RESISTANCE BETWEEN A AND B AIRING YOUR TOPIC SOURCES FOR GENERATING IDEAS ANALYSING THE POSSIBILITIES OF A TOPIC THE INITIAL SEARCH AND REVIEW OF THE LITERATURE TOPIC QUESTIONS METHODOLOGY AND DATA QUESTIONS VALIDITY AND RELIABILITY QUESTIONS WHERE YOU SHOULD NOW BE USE REFERENCE AIDS RISKING A POOR CHOICE OF TOPIC FEATURES OF GOOD TOPICS USING YOUR SUPERVISOR FOCUSING IN ON A POTENTIAL RESEARCH TOPIC DEVELOPING RESEARCH QUESTIONS DEFINING CONCEPTS STATING THE AIMS AND PURPOSE OF YOUR RESEARCH WRITING OBJECTIVES FOR YOUR RESEARCH USING A HYPOTHESIS IN YOUR RESEARCH RESEARCH PROPOSITIONS SUMMARY OF THIS CHAPTER FURTHER READING

Your first rite of passage into the world of research is finding a topic for your dissertation. You can make the process difficult by ignoring the advice of your supervisors and this book or you can work through the tactics we suggest here and enjoy the challenge. The main problems some of our students seem to have in identifying potential topics are that they have misconceptions about what a masters research topic is. In this chapter we will look at some criteria to use when thinking about a topic, at sources for generating ideas for a topic and at ways to formulate your ideas into a topic capable of resulting in a masters dissertation. The stages and processes of this are shown in Figure 3.1. The main purpose of this chapter is to address the following kinds of questions:

Chapter-03.qxd

11/12/2004

2:51 PM

Page 56

56

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

Need for a topic

Identify an area of interest for a masters research topic

Talk to your tutors about your ideas or for suggestions

Reflect and explore possibilities Identify sources of information

Departmental research specialisms? Current professional issues? Historical debates? Work-based problem?

Collect anything that looks interesting or useful

Do a reconnaissance of the available literature

List and define key concepts and terms

What research has been done on the topic? What research questions have been asked? What methods have been used to conduct research on the topic?

What kinds of research puzzle exist in the literature?

Suggest possible puzzles and questions for your research

Evaluate topics against criteria

Is the necessary and sufficient data available? Can you get access to the data? Have you the skills to analyse the data? Have you the time?

Talk to your tutor about your topic analysis

You should now have a research topic for your masters that can be focused and formulated more clearly as aims and objectives, questions or a hypothesis

FIGURE 3.1

FINDING A PROPER TOPIC

Chapter-03.qxd

11/12/2004

2:51 PM

Page 57

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

57

1 What kinds of topic are suitable for masters level research? 2 What are research aims and objectives? How should they be written? 3 What is a hypothesis and proposition? Does all research need them? 4 How can a topic be justified?

We end the chapter by looking at defining your research topic in terms of how to formulate good research questions and hypotheses and aims and objectives, but for now we begin with some misconceptions about masters research itself.

Misconceptions about topics


There is the misconception that masters research should be something that makes a difference to the world, something that has an impact on our views or understanding and therefore, in some way, makes a contribution to the stock of scientifically acquired knowledge. There is nothing wrong with wanting to do research that has an impact for the good of human kind, that advances, in whatever way and to whatever degree, the stock of knowledge and ways of understanding the world around us. But at masters level these goals should not be paramount or be the criteria for topic selection. The generally held belief that masters level research is about discovery, change and knowledge generation needs to be placed to one side. This belief about the nature of masters research is, however, quite understandable. It follows the general view that research is about discovery and bettering the conditions of human kind. This view often has the associations of the scientist in the lab surrounded by expensive-looking equipment, working long hours on a problem, facing setbacks, fighting bureaucracy but eventually being triumphant against the odds. Historically there have been such people and their endeavours have been the subject of cinema and television. It may be that such representations are in part responsible for this view of research. The only aspect of reality from this that you may encounter is the problem with bureaucracy. The rest is largely myth. For now it is important to understand that masters level research is not primarily about discovery or making an original contribution to knowledge, though it may do this. If it does, this should normally be a secondary consideration to the primary function of your dissertation. This function is to demonstrate your skills and abilities to do research at masters level. Your topic is, in the main, a vehicle for you to display your skills and abilities as a researcher and to demonstrate that you have the qualities and attitudes required

Chapter-03.qxd

11/12/2004

2:51 PM

Page 58

58

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

to be a potential member of the broader research community and to be considered capable of being a research assistant or going on to do doctoral research. It is the same for the work-based dissertation, but in addition you also have to demonstrate the ability to be a practitioner and researcher and be able to manage the issues this involves. The topic you choose should therefore have the features necessary for you to exhibit the skills, capabilities, attitudes and qualities which are subject to assessment.

What is a topic?
Topics suitable for masters level research come in a variety of shapes and formats. Finding a topic is, however, essentially about formulating a set of questions or hypotheses that require research of some kind in order that answers can be provided or statements put to the test. The range and types of question that can be asked and the kinds of hypotheses which can be stated mean that there are an infinite number of topics. Added to this is the point that not all research topics require the collection of primary data. Some can be based on the existing literature and in such cases the literature becomes the data. What counts as data or evidence also varies, but is often closely related to the way the topic has been formulated and the preferences made for how it is to be researched. The common denominator for all research topics is that they are puzzles in need of investigation. TOPICS AS PUZZLES A puzzle is something requiring, if possible, a solution. I say if possible because not all puzzles can be solved, and many of those which appear to have been solved can be subject to modification or different solutions by other research. By puzzle I mean something generally or specifically not known and therefore requiring sensible questions to be asked that are capable of solving the puzzle or a part of it. There are different kinds of puzzles and the main ones can be seen by using words in the research questions such as when, why, how, who and what. For example, the following are some simple puzzles capable of being refined to provide a focused set of questions:

How are crime statistics related to crime? Why and how did Durkheim define suicide in the way he did? When and why did the romance novel become popular? What are the variables in television news selection? What are the key variables reproducing the cycle of deprivation?

Chapter-03.qxd

11/12/2004

2:51 PM

Page 59

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

59

We will shortly look at how to focus these general kinds of questions, but we will first look at the kinds of puzzles each exhibit. In Table 3.1, following Jennifer Masons (1996) categorization of puzzles, we have identified five main types of intellectual puzzle which form the basis of research.
TABLE 3.1 DIFFERENT KINDS OF PUZZLE Description This is the how much of Y exists or why did X develop? Example: how and why did Durkheim define social order in the way he did? And what consequences have this had on the development of sociology? This is the how does X work? Example: how are crime statistics compiled? And how is crime defined and how does this and the process of compiling the figures create the statistics for criminality? This is the what, if any, relationship is there between variable x and variable y? You are attempting to identify if there is a relationship, association or no relationship between variables. This is the why does (or strongly influences) x cause y? Example: among the many events which occur daily, what variables influence the selection of stories for inclusion in the news? And what is seen (definitional work) by news selectors to count as newsworthy and why? This is the why is X assumed? Example: why is generality assumed to be the gaol of science? And what would the advantages be if this goal were put to one side to describe the essential features (essence) of phenomena rather than trying to explain them? Descriptive and illuminative research puzzles

Kinds of puzzle Developmental puzzle

Mechanical puzzle

Correlational and explanatory research puzzles

Correlational puzzle

Casual puzzle

Ethnomethodological research puzzles

Essence puzzle

Source: Adapted from Mason, 1996.

Chapter-03.qxd

11/12/2004

2:51 PM

Page 60

60

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

General topic area

Different kinds of puzzle Developmental Mechanical Correlational Causal Essence

Influences on the puzzles include Historical movements Contextual milieu Comparative studies Conceptual language Political stance

FIGURE 3.2

TOPIC PUZZLES AND INFLUENCES

We will look more closely at some real examples that illustrate the structure of these puzzles throughout this book. For now it is relevant to see how different kinds of puzzle give rise to many sub-puzzles due to the influence of history, context, comparison, concepts and political stance. In Figure 3.2 we can see the place of the different puzzles within the general scheme of a topic and I have attempted to indicate the different directions research may take as a result of these influences. Figure 3.2 has been constructed from what Silverman (2000) suggests are the kinds of influences that can be used to sensitize you to various research issues. We have changed these a little and added conceptual language and comparative studies to his list, which are summarized in Table 3.2.

PUZZLES AS RIDDLES TO BE UNRIDDLED One way of understanding puzzles is, according to Pertti Alasuutari (1995), to see them as riddles to be formulated and then to be unriddled solved in some way. If we take this idea and synthesize it with Silvermans influences, then we have the basis for identifying a range of perspectives we may use when framing our topic. Hence, while I agree with Silverman, I would also recommend that these influences might be used to help you generate an understanding of a topic and then begin the processes of focusing in on a puzzle that has real researchability. Your focus may be on any one or a combination of the following:

Chapter-03.qxd

TABLE 3.2

SENSITIVITIES INFLUENCING RESEARCH

Sensitivity

Amplification

11/12/2004

Historical movements

The main elements of this are: (a) research is often closely related to intellectual movements and counter-movements of the time in which it was done, such as structural functionalism (1940s/50s), conflict structuralism (1960s/70s), symbolic interactionism (1960s), post-structuralism (1980s); and (b) the historical origins of research puzzles and developments in knowledge can usually be traced and their methodological assumptions identified as being, for example, foundationalist, positivist, anti-positivist.

2:51 PM

Contextual milieu

The contextual elements which sometimes influence research are the social, economic, political, technological and legal variables deemed important at the time of the research. Within these are the policy movements which are receiving the most attention. These are often expressed in general phrases that imply a contrast or/and development from a previous state such as post-industrial society, information society and knowledge economy.

Page 61

Comparative studies

Poverty, mental illness, immigration and sexuality are categorizations used in many research studies, but have different uses based on different definitions, criteria and methodological approaches. From meta-theoretical studies to in-depth case studies, these kinds of categories result in different kinds of research depending on the purpose and preferences of the researcher. Hence, mental illness is not a phenomenon able to be uniformly defined, but is dependent on context and historically-rooted definitions.

Conceptual language

Different perspectives in the social sciences, such as symbolic interactionism and post-modernism, have languages made up of concepts intended to describe and sensitize us to different kinds of social dynamics. Examples from the interactionist Erving Goffman include, stigmatization, presentation of the self, managing the self and degradation ceremony. This language is the discourse of the perspective and provides, like the discourse of other perspectives, a framework for understanding phenomena.

Political stance

A substantial amount of research is politically motivated and is aimed at revealing patterns of relationships such as inequality, discrimination, exploitation, and these are often seen as being a part of broader conceptual theories which attempt to explain inequality as part of different forms of social organization, such as capitalism or patriarchy.

Source: Adapted from Silverman, 2000.

Chapter-03.qxd

11/12/2004

2:51 PM

Page 62

62

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

the historical development and origins of a puzzle, revisiting the basic assumptions which were used and scrutinizing the data collected and interpretations made at the time, to identify alternative starting points and mis-interpretations of seminal works;

the contextual milieu when the research on the puzzle was done, including cultural assumptions of the time, place and social group and on the use of various categorizations such political system and contrasts categories used such as preindustrial and industrial and primitive and advanced, to identify the influences of cultural assumptions on classifications, research design and interpretations;

comparative studies and findings from the same and different disciplines done at different times using different approaches, to compare and contrast in order to identify gaps and possibilities for the further development of a particular study;

the concepts used to describe and categorize phenomena such as alienation, power and control, analysing how these have been defined and operationalized in different studies, and how they have framed and restricted paradigmatic understanding of a topic, to identify other definitions and situations where they can be employed to understand social situations and dynamics; and

the political and ethical biases in research to identify preconceived assumptions and their consequences, of the ways in which topics have been selected due to their usefulness in demonstrating the validity of assumptions, to critically evaluate such demonstrations and suggest alternative approaches which are less value-laden and biased.

Sensitivities can be useful to place into context a problem you are considering. They can help you to start the process of scrutinizing the literature by asking questions such as: How are these concepts related?; What definitions have been proposed for this phenomenon?; and What standpoint has this research been done from?

Basic advice on research topics


There are a number of points we can make at this stage to help you select an appropriate topic for your masters dissertation. The prerequisite to this advice is that whatever topic is finally chosen, it should be capable of resulting in a complete dissertation

Chapter-03.qxd

11/12/2004

2:51 PM

Page 63

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

63

in the time you have available. This may seem obvious, but many good ideas for a topic cannot be done in the normal period of time expected for a masters dissertation. Most topics will take a lot longer to research than most people initially estimate. A good rule of thumb is to estimate how long the research will take you, triple that estimate and then consult with your tutor who, you will find, will add more time on to it. Research is very time-consuming. So how do you find a topic capable of being done in six to nine months for a full-time student and 12 to 16 months for the parttime student? The six kinds of advice we offer you are:
1 The earlier you start the better. 2 Go from the general to the particular. 3 Avoid politicized topics. 4 Be careful with personal issues. 5 Find the line of least resistance between A and B. 6 Airing your topic.

THE EARLIER YOU START THE BETTER When is the best time to start thinking about and looking for a suitable topic? The answer is the earlier the better. The sooner you start to think about and investigate possible topics for research, the sooner you will decide on one and can begin to develop your research proposal. The earlier you select a topic, the more time you will have to do the research. This means you may be able to undertake research on a topic that requires slightly longer than another, or one you can investigate in a little more depth or use data collection techniques that are more sophisticated. As soon as you start your masters course, begin to think about and discuss with your tutors ideas for topics.

GO FROM THE GENERAL TO THE PARTICULAR Think of the task of identifying a topic as a process of refinement. This will mean going from the general area in which you would like to do research to the particular

Chapter-03.qxd

11/12/2004

2:51 PM

Page 64

64

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

aspect that you can do research in. The general area can be something like, moral panics, construction of statistics, analysis of advertisements, history of science, experience of immigration or whatever. These phrases are broad categories that can often be used to tell others, when they ask, what your research is about. They are the higher-level abstract designators that can be used to look for specific research questions which may have the potential for a masters dissertation. If there is an area you have an interest in, then consult with your tutor and ask if they have any suggestions for specific research in that area. At the same time do some visualization of what a topic might look like in terms of what else can be done in the area and can form the basis for your research. This may mean reading some secondary sources on the area to get an overview of what has been done, what kinds of assumptions have been made and where the general interest in the area came from. Figure 3.3 shows what we mean by this using the example of moral panic. In this example we can see how the phrase moral panic can be used to investigate its origins, previous uses and how it may be used to generate an idea for a topic. In particular it shows how we can begin to think about the necessary issue of the availability of sufficient data.

AVOID POLITICIZED TOPICS What is your motivation for doing the research? Your primary motivation should be to acquire and develop your research skills and capabilities alongside the necessary attitude of reflection that will help you to demonstrate your research qualities. You should avoid topics that you want to use to demonstrate a political argument or forward a moral cause. As with personal issues, politicized causes as research topics are inherently problematic. They take with them motivations that have little to do with the research and more to do with providing evidence for the cause. If you intend to do such research, be clear on your reasons and how you will ensure the validity and reliability of your research. For example, we recently had a student who wanted to investigate what are called holocaust denialists arguments on the Internet. These are individuals and groups who deny that the extermination of 6 million Jews (and others including gypsies, Catholics, homosexuals and trade unionists) in Nazi concentration camps ever took place. This is a topic fraught with emotion, politics and prejudices and because of this is a difficult one to research clearly in an objective manner.

Chapter-03.qxd

11/12/2004

2:51 PM

Page 65

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

65

From the general concept or theory to snapshot definitions and terminology to actual research studies to thoughts on what other news items may be analysed using the theory

Moral panic

Sociological theory

A situation reported in the general media which, regardless of its validity, causes outrage and public concern that something ought to be done about it. A reaction to an event that is out of proportion to the actual threat offered, even though the threat could have existed for some time.

Studies of moral panics

The Drugtakers (1971); Folk Devils and Moral Panics (1973); The Manufacture of News and Deviance (1973); Policing the Crisis: Mugging, the State and Law and Order (1973)

More recent studies with a change in perspective: Rethinking Moral Panic for Multi-mediated Social Worlds (1995); Moral Panics (1998)

Media reports amplification of deviance

Food panics: mad cow disease foot and mouth

Environmental panics: carbon dioxide greenhouse gases

Data? Newspaper articles and editorials

FIGURE 3.3

TOPIC ANALYSIS FROM THE GENERAL TO THE PARTICULAR

Chapter-03.qxd

11/12/2004

2:51 PM

Page 66

66

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

These problems were compounded by the student having a Jewish heritage and therefore the research committee of the university preferred her not to research the topic. However, because of the importance of the topic she was allowed to go ahead but under strict guidance from her supervisor, who insisted she took the denialist arguments seriously and took as her research puzzle the construction of their argument and compared these with the counter-arguments against the denialists to find out how something that had been historically documented and taken for granted could be challenged. Her dissertation became an investigation into argumentation and evidence, carefully analysing and describing the argumentative structures and use of evidence from opposing sides. She therefore looked carefully at the validity and reliability of historical evidence as a research puzzle and did not use any emotive language in her conclusions or allow her own (understandable) feelings to be expressed. This is the kind of topic that may be important to investigate but is very difficult to do in practice. You may wish to reflect on how you would have done such a project and what reasons you could give for going ahead despite the problems.

BE CAREFUL WITH PERSONAL ISSUES If there is an issue you feel strongly about and have an axe to grind, do not choose this as your research topic. For example, euthanasia or abortion are topics which are embedded with substantial moral debates and from a research perspective will involve serious consideration of ethical issues. With such topics it would be difficult to start your research without a set of preconceived beliefs and attitudes toward the main issues. This does not mean you should avoid research that involves moral issues. Topics such as poverty, abuse, adoption and the like are issued based. But this has not prevented many good research studies being done to clarify the issues, definitions or processes or to identify possible causes and consequences. Standpoint research as it is called is often the basis for social policy research and therefore has an important place in the social sciences.

FIND THE LINE OF LEAST RESISTANCE BETWEEN A AND B If we take it that the objective of masters research is to produce a dissertation of sufficient quality to be deemed masters level and that the means to this is doing research, then the less complex the research to be done the more likely it is that the dissertation will be done on time and to the required standard. This means selecting

Chapter-03.qxd

11/12/2004

2:51 PM

Page 67

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

67

a topic that can not only answer yes to the following questions, but provides the simplest of routes to yes:

Is the data available? Can you get access to the data? Have you the skills to analyse the data?

Data for providing answers to your puzzle must be available. This means the data whatever this is must be out there and be of the necessary kind. The sources of your data must be identifiable at the outset. You should be able to say what data is needed for your puzzle and why it is needed. Second, you need to be able to get access to the data. If you need responses from senior managers in the health services, what would make you believe they would be willing to fill in a questionnaire for you or even spend time allowing you to interview them? Third, do you know how you will manage and then analyse your data? Knowing that the kinds of data you need can be obtained is not enough; you also need to know how you will use this data to answer your research questions and/or test your hypothesis. If your statistical knowledge and skills are basic, quantitative data requiring analytical statistical techniques may not be a practical consideration. If your time restricts opportunities to go into the field, you may consider doing a topic that suits your situation. This means looking for a puzzle in a topic area that can be done using desk-based research. This could be the analysis of a debate in the social sciences, the testing of criteria used to evaluate an Internet-based information source, the study of extracts of conversation or a media production. You will still need to search and review the literature, design instruments for the collection of your data and identify suitable techniques to analyse it. This is not a simpler strategy for doing your research, but one among the many alternatives open to you.

AIRING YOUR TOPIC Each discipline in the social sciences seems to have its own preferences on what constitutes an appropriate way to do research and this often influences the kinds of topic expected from masters students. Be prepared for some degree of disciplinary and departmental opposition from some of your tutors. If you are studying in a department known for its quantitative research, then expect your tutors to express this when

Chapter-03.qxd

11/12/2004

2:51 PM

Page 68

68

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

evaluating your research suggestions. But do not be afraid to put forward a topic and methodology that differs from the norm. Many interesting pieces of research, at all levels, were initially seen as deviations from the expected. If you follow this track, of pushing at the boundaries, then be prepared confidently to justify your topic using sound argument and evidence.

Sources for generating ideas


There are many sources you can use to begin generating ideas for your research. This process may even begin before you go to university to do your masters degree, and in Chapter 1 we made some observations about this that are also relevant here. Do not expect a sudden creative vision that leads to your research topic. Bright ideas for a topic are usually the outcome of research and reflection. Typical sources for initial ideas include the following:

Taught modules you are doing on your course. Have you covered a topic that interested you, which you would like to look at in more detail?

Has a tutor mentioned a research study that you found interesting, even puzzling, that you feel needs questioning?

If you are doing your masters as part of a professional qualification, look in the professions journal to see what the current issues and concerns are and if these have a research possibility.

At work, if your research is to be work-based, what are the main issues, development needs and management problems that require some research?

Have you listened to a visiting speaker to your department who talked about a project that may have other possibilities?

Are you interested in particular phenomena that you cannot find much about in the library?

Have you observed a pattern of behaviour you found interesting or perplexing and would like to find out more about?

What projects are staff working on; do these interest you?

Chapter-03.qxd

11/12/2004

2:51 PM

Page 69

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

69

Analysing the possibilities of a topic


Once you have an idea for a topic and have discussed this with your tutor(s), you need to go to the library and investigate its possibilities. This means identifying sources of information, obtaining some literature and subjecting it to an initial scrutiny. The main stage in this part of the process is the initial search and review of the literature to identify research possibilities.

THE INITIAL SEARCH AND REVIEW OF THE LITERATURE Use the library and not the Internet to plan an indicative search of the available literature. This is the literature that is in your library or available in electronic format literature that can be obtained within a couple of days. In Chapter 6 we show you how to do a comprehensive search and review of the literature. If, however, you already have an advanced understanding of the topic area, then use the following books to ensure that you have the necessary skills for bringing your knowledge of the literature up to date:

Doing a Literature Search: A Comprehensive Guide for the Social Sciences (Hart, 2001).

Doing a Literature Review: Releasing the Social Science Imagination (Hart, 1998).

At this stage of your research you should be doing a reconnaissance of the library. Do not aim to define your main concepts or formulate clear research questions, but enjoy the freedom you have to explore the possibilities for your topic. This means looking to provide overviews that help you to have a basic understanding of two sets of questions. The first set is about the topic area itself and the second about the methods that have been used to do research into the topic area.

Topic questions Once you begin to obtain some of the literature books, articles and reports subject it to a brief speed read. You are not looking to make copious notes on the details from individual books or articles, but to get an overview of the context of your topic. Look in your search for literature that provides initial answers to the following kinds of questions:

Chapter-03.qxd

11/12/2004

2:51 PM

Page 70

70

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

What are the key texts and authors on the general topic area? What concepts and theories have been used on the topic? What is the history of the topic? What kinds of arguments are there about the topic?

Even a rudimentary understanding of the origins and context of the topic will enable you to start thinking about the possibilities for your own research. It will provide you with research themes and issues which have been developed and debated by researchers in the topic area. With this knowledge as your frame of reference you can begin the work of looking for a topic that has a research focus. This may mean developing a piece of research that has already been done or analysing contributions to a debate about the topic.

Methodology and data questions By looking at the research elements of studies you obtain, you are aiming to understand how the studies were done and, if possible, what kinds of methodological approaches (that is, quantitative or qualitative) and assumptions were used. The kinds of questions you need to be asking are:

Has anyone else done research on this topic? If so, how? What research questions did they ask? Did they use an hypothesis? What methodology and data collection tools did they use? What did they find?

Do not worry if you find that someone else has done research on a topic you have in mind to do. It is often possible to deconstruct existing research, to critique it and find ways of developing it in ways different from the original. The social sciences have

Chapter-03.qxd

11/12/2004

2:51 PM

Page 71

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

71

many examples of this process. For example, sociological research into the phenomenon of suicide has its origins in the seminal study by Emile Durkheim, but many others have been done since, each exhibiting a different approach. Some of these can be seen in Figure 3.4, which indicates the range of different approaches that have been used from Durkheims original positivistic approach to interpretivist, ethnomethodological and conversation analysis.

Validity and reliability questions Although most research in the social sciences is valid and honest, some of it has dubious foundations. As with all forms of research, including that done in the physical sciences, there is a degree of unsubstantiated generalization, political and ethical bias and, on rare occasions, fraud. The two classic cases of the latter are the work of Cyril Burt on intelligence (Beloff, 1980) and the work of Bruno Bettelheim (1976) on the psychology of fairytales. Both exemplify how a person can reach the top of their profession, receive numerous accolades yet, as in the case of Burt, base his work on nonexistent research, while Bettelheim gained fame by plagiarizing the work of another. Do not assume that because a piece of research has been published or has attracted attention that it is valid and provides a solid foundation for your research. This does not mean you use extreme scepticism, but exercise moderate scepticism when you come across something that is taken for granted as knowledge or fact. Remember to ask your tutors about the research you find in the literature, especially about what they know of its origins and how it fits into the broader context.

WHERE YOU SHOULD NOW BE Your reconnaissance of the literature in the library should have given you sufficient information, such as the names of authors, words and phrases used to describe the topic, and an understanding of the structure of knowledge on the topic. In Table 3.3 the essential information and knowledge that you will obtain from your initial search and review of the literature is summarized. While Table 3.3 shows you the range and kind of information you will gather during an initial search of the literature, Figure 3.5 shows a simplified map of the main questions, concepts, methodological assumptions and methodological approaches for the

Chapter-03.qxd 11/12/2004 2:51 PM

Seminal study Durkheim (1897) Suicide Statistical study presents social explanation: suicide varies ... with the degree of integration of the social group to which the individual forms a part. Sociological explanation is based on the social bonds that bind the individual to society: social integration integrative bonds, moral regulation regulative bonds. The major aphorism of sociology: the objective reality of social facts.

Page 72

Douglas, J. (1967) The Social Meaning of Suicide

Argues for the need to consider how official statistics on suicide are collected, and how officials interpret death to decide cause, what ends up in the official statistics is the end result of a lengthy process of interpretation and decision-making, is a need, therefore, to examine criteria to diagnose death and search procedures of coroners.

There is a massive literature on suicide. Approximately 7,000 studies of suicide have been published.

Sacks, H. First lecture, 1964 (published 1992) Discovery of the rules of conversational sequences and modelling those sequences from research into calls to a suicide prevention centre. Classic study in conversation analysis that identifies how professionals in an institutional setting structure a calls response (the calls opening) to elicit a paired action (the callers name) without asking for their name explicitly so as not to make them hang up.

Maxwell-Atkinson, J. (1978) Societal Reactions to Suicide: The Role of Coroners Definitions

Taylor, S. (1982) Persons Under Trains Shows how coroners construct a suicidal biography and negotiate the verdict. In a number of apparently identical deaths emphasis is on the coroner, without some other evidence of his [deceased] being disturbed in his mind or acting funny, or having threatened to take his own life which we havent got, suicide would, I think, be an unsafe verdict to return. Concludes, there is no ... correspondence between official suicide statistics and

Critique of Douglas and interactionist approaches to suicide to argue for an ethnomethodological study focused on the data problem and the problem of the coroner, in reading the relics to find indicators of suicidal intent, suicide notes, mode of death, location and circumstance of death, life history and mental condition to arrive at an account which is rational and for all practical purposes are not just already there, but are all that are there.

No matter the number of studies on a topic, no topic is ever closed all have more possibilities for research.

FIGURE 3.4

SOCIOLOGY AND SUICIDE: SAME GENERAL TOPIC, DIFFERENT WAYS OF RESEARCHING IT

Chapter-03.qxd

11/12/2004

2:51 PM

Page 73

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

73

TABLE 3.3 OUTCOMES FROM AN INITIAL SEARCH AND REVIEW OF THE LITERATURE Outcomes An initial bibliography of texts on the topic. A list of the key authors on the topic. Questions What are the key texts and authors on the general topic area? What concepts and theories have been used on the topic?

A list of the main concepts used in works on the topic. A list of the main theories used by individual and groups of authors to account for the topic. The origins and seminal works that gave rise to and initially defined the topic.

What is the history of the topic?

What kinds of arguments are The historical development of the topic in terms of there about the topic? arguments and debates over theories, concepts and data. This includes the different perspectives, standpoints and approaches which have been taken to frame and understand the topic. An understanding of how others have designed their research to investigate an aspect of the topic. A list of research questions which have been asked and an understanding of what has been considered important within the topic for research. Identification of hypotheses which have been constructed and tested and how they were tested using what kinds of evidence. An initial understanding of the methodological assumptions which were used and preferences for particular methodological approaches (quantitative or qualitative). An understanding of the main data collection tools commonly used. Has anyone else done research on this topic? If so, how? What research questions did they ask?

Did they use an hypothesis? If so, what type?

What methodology and data collection tools did they use?

Lists of key findings from the main research studies. What did they find? These can be used to make comparisons and identify gaps and need for further developments of particular studies.

Chapter-03.qxd 11/12/2004

How does advertising work?

What are the roles of advertising as an institution?

2:52 PM

Psychographics and subliminal messages. What are the effects of advertising?

Page 74

Referent systems of hidden codes. Deception, lack of detail and false information. Symbolism and iconographic associations.

Supports and reinforces capitalist mode of production. Persuades that want is more important than need. Socializes into capitalist consumerism.

Methodological approach

Structural analysis of advertisements and commercials to show the structure of codes, hidden messages, methods of persuasion using semiology and iconographic analysis.

Creation of false wants over real needs. Focus on ownership of the latest products. Emphasize consumption patterns and promote certain social identities.

Methodological assumptions Advertising is a part of capitalism and as capitalism is undesirable so too is advertising. Theorization of advertising within the broader theories of capitalist social relations, surplus and contrast between utilitarian and non-utilitarian products. Use of statistics on production, consumption and other economic data to show the pervasiveness of advertising.

Based on the assumptions that advertisements contain hidden messages that can only be seen through analysis.

Marginalization of sections of the population. Social relations based on consumption. Emphasis on the individual. Reinforcement of ethnic and gender stereotypes. Trivialization of language. Debasement of culture. Creation of false worries and anxieties. Creation of false problems.

FIGURE 3.5

RESEARCH QUESTIONS, CONCEPTS AND DATA THE EXAMPLE OF SOCIAL SCIENCE AND ADVERTISING

Chapter-03.qxd

11/12/2004

2:52 PM

Page 75

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

75

social science treatment of advertising. Figure 3.5 is intended to demonstrate how you can use diagrammatic representations of a topic to summarize key concerns, the research questions that have been asked and also to begin the task of identifying the methodological assumptions used for research into a topic.

USE REFERENCE AIDS It can be useful to have at hand a selection of reference tools. These may be social science dictionaries and encyclopaedias which will help you to find quickly short summaries and definitions of key concepts and theories. Useful reference sources which you will normally find in most academic libraries include the following and are often quicker to use than electronic sources you find on the Internet:

The Blackwell Dictionary of Twentieth Century Social Thought (Outherwaite and Bottomore, 1993).

The Social Science Encyclopaedia (Kuper and Kuper, 1999).

Your lists of words, phrases and definitions will be of use when you need to search and review the literature in more depth. This will be when you have decided on your specific research problem and have written your research proposal.

Risking a poor choice of topic


We outlined some of the misconceptions about masters topics at the beginning of this chapter; here we want to draw your attention to some of the ways in which students in the past have made some basic mistakes when selecting their research topic. The following are some of the ways in which you can engender a high level of risk into your research with the probability that it will fail.
Risky behaviours

Implications

Choose a topic in a hurry

With no or little analysis of the practicalities of researching a topic you may face far too many unanticipated problems to deal with in the time

Chapter-03.qxd

11/12/2004

2:52 PM

Page 76

76

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

you have available. Topic analysis will identify most of the issues and problems you are likely to face in your research this is why you need thoroughly to analyse your topic before you start.

Select the method before the topic

Methods of data collection and analysis should be appropriate to the topic and not the other way round. You may be good at statistics or talking to people, but these should not be the first criteria for selecting a suitable research puzzle. The puzzle should be clearly formulated before you select data collection methods, otherwise you will be introducing a bias into your research equivalent to selecting a topic because it fits in with your political view of the world.

Procrastinate for months over different topic ideas

Doing little by dallying over possible topics wastes the valuable time you would otherwise be using to get on with your research. If you cannot make a decision, then take direction from your supervisor and stick to the decision they recommend.

Generalize about your topic

Vague and generalized ideas will lead to vague and problematic research. The broader the research idea, the more work it will involve to manage. The narrower your topic puzzle, the more likely it is that you will be able to identify precisely what you will need to do to finish your research in the time available.

Ignore the basic criteria

If you do not know what kinds of data are needed, then you will not know if you can get access to them or how to analyse them. Ensuring that you know the answers to these questions is equivalent to knowing where you are going before you set out on your research journey.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 77

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

77

Do not talk to your supervisors

Silence from you may mean you have done nothing and are embarrassed to tell your supervisor or that you need no guidance. In the first case your supervisor will not be embarrassed because they are there to guide you and help you through your research blocks. In the second case, how do you know you are doing your research in the ways expected if you are not seeking and receiving regular feedback?

Features of good topics


So far we have identified a number of features which, when combined, can result in a good topic for research and it may helpful to summarize these:

Criterion

Implications

Data availability

The data you need to provide answers or solutions to your research problem must be available to you in sufficient quantity and quality. This means there must not be too much or too little data and it must be available using reliable collection techniques. Good topics have actual (secondary) or potential (primary) data available.

Access to the data

The data you need may be available, but not to you or in the way you need it. It may be commercially or personally sensitive data or even expensive if it has to be purchased. Good topics have data available which you can access with few problems.

Time available

No amount of enthusiasm can create the time you need for a research project. A good topic is one that has been clearly delimited and can be done in the limited time you have available.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 78

78

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

Availability of resources

Computing and software may be needed along with published materials, such as reports. If these are not readily available, then unnecessary risks will be encountered. Good topics tend to require few resources and those which are needed should be readily available.

Capabilities and skills

You may be impressed with a statistical technique or computer program, but if you do not have the necessary skills and understanding at the start of your project, then the time and energies needed to learn these may take too much away from the research itself for it to be successfully completed. Good topics are those that build and develop on capabilities, skills and knowledge that you already have.

Symmetry of potential outcomes

It can be uplifting to establish a link between variables and have a positive result from your research that shows a link and why one exists. It is equally valid to show that a link does not exist. Good topics have the capability of resulting in positive and negative results.

Using your supervisor


A key to the success of many dissertations is the supervisor. Use your supervisor as much as possible throughout your dissertation research. They have the experience of supervising many previous students and therefore have a knowledge you do not have. They will be able to help you formulate your ideas on a topic, direct you to reading and may even suggest a topic they know can be done. By exploring a topic with your tutor you will be more likely to develop a positive and constructive relationship. Remember that along with another internal and an external examiner your supervisor will assess your dissertation, so it is important to develop a good working relationship as early as possible. As a basis for your initial discussions take with you an outline, on a sheet of A4, of your idea for your research. This does not have to be typed or neat. A handwritten

Chapter-03.qxd

11/12/2004

2:52 PM

Page 79

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

79

and roughly sketched-out idea will normally be better than something you have spent time and effort making look good. At this stage neatness is a luxury that is not needed and is a waste of time. It is your ideas that count, so focus your efforts on these and be prepared to share them with as many of your tutors as possible. What you can expect from your tutors is feedback to steer your idea in a direction that leads to a researchable topic. Do not worry too much if you receive guidance that seems to be conflicting. Different tutors will naturally have their own ideas as to what kind of research your topic idea suggests; often this is based on their own research interests and methodological biases. If possible, ask for a list of the research interests of your tutors and a list of dissertations they have previously supervised. These will give you an idea of their research orientations and biases and the kinds of topics they tend to supervise. Try not to be hesitant in sharing your idea for a topic with a tutor remember that they are on your side and are there to guide you. Once you have broken the ice with a topic, set up a schedule of tutorials to explore your idea, giving yourself enough time between each to follow up on the guidance you have been given. Even if you find that you did not have enough time to do everything, or even anything between tutorials, still keep to the schedule using the tutorial for a general discussion about your topic. Whatever you do, do not fail to turn up for a tutorial because you have not done much as this may annoy your tutor who, quite understandably, having invested effort on your behalf doesnt want their time wasted. Failure to attend a session can also be embarrassing when you next see your tutor and can lead to avoidance tactics by both and in extreme cases breakdown of the supervisory/student relationship.

Focusing in on a potential research topic


If choosing possible topics is the first step in your research, then developing one into a set of research questions, propositions, possibly a hypothesis, with a clear statement of purpose and objectives are the next steps. By developing these you will be defining what your research will be about, why it is needed and what kind of research it will be. In this section we will look at the process and relationships between research questions and different types and purposes of research. Details of methodological approaches and traditions we leave until Chapter 7, but even at this stage your research questions will give strong indications of these and help you in designing an overall strategy for your investigation. But before this some corrections to pre-existing assumptions may be useful. Across the different disciplines of the social sciences there

Chapter-03.qxd

11/12/2004

2:52 PM

Page 80

80

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

are some differences of opinion on what constitutes a properly formulated research problem. In a guide to doing a dissertation the following statement is made: A second criterion is that the question should suggest a relationship to be examined. This is a particularly important characteristic, because the purpose of doing research is to advance science. Because science is the study of relationships between variables, no relationship, no science. No science, no thesis or dissertation. It is that simple. (Cone and Foster, 1999: 35) This is a rather stark view of what constitutes a research problem and a dissertation.You may remember some of the comments made in Chapter 1 about the need for clarity of understanding in the social sciences. This statement clearly has no appreciation of this attitude. My position, after years of experience supervising dissertations, is that this is only one view among many of how to express a research problem and what can count as valid, meaningful and useful research and research for a dissertation. I therefore take issue with these kinds of views of research because they exclude the possibility of alternatives and by doing so are dogmatic. While this is not the place to engage in discussion of what may or may not constitute science, my view is that a more inclusive approach should be taken and for the sake of progress in understanding our world human and physical I will use the word research when looking at the formulation of research problems for a dissertation. DEVELOPING RESEARCH QUESTIONS What you may have done so far is to identify some broad topic area and undertaken an analysis of it through a preliminary search and review of the literature. You should have eliminated from your list of possible topics those which were too risky or failed to meet adequately the criteria of access to data and time to do the research. With the topic or topics you are still considering it is time to choose one and run with it by developing an aspect of it into a puzzle for your research. One of the first steps in this is looking to see what questions can be stated which are puzzles needing research in order to be addressed. Note I say addressed and not answered. This is because your research may find that there is no answer, in any definitive sense, to a question, but it can give an advancement in understanding and clarification, which in themselves are worthwhile outcomes. Research questions are questions you intend to employ systematic research to investigate; they are what is to be investigated. They should embody the purpose and type

Chapter-03.qxd

11/12/2004

2:52 PM

Page 81

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

81

of research necessary to unravel the puzzle they set for investigation. Remembering the different types of puzzle the developmental, mechanical correlational, causal and essence puzzle your questions should have a focus on one of these. Typically a puzzle will have a series of questions such as: How well does this program work? How can we measure well? What can we compare it to? How do users and providers assess well? The focus here is on evaluating the performance of a programme, say in education, health care or in the community. This means that some form of evaluative research design will be required involving descriptive statistical data, possibly from a survey/questionnaire, along with qualitative data, possibly from interviews. General questions are OK as the starting point, but will usually need refining to make them more precise, clear and focused. Clough and Nutbrown (2002) suggest using what they call the Goldilocks test and Russian doll principle. The Goldilocks test looks to assess research questions in terms of how big they are. Big questions are usually too big to be answered. This is because they either lack precision, needing to be broken down into smaller, more manageable questions, or are too vague, needing precision to make the concepts measurable. For example, What is consciousness? is, as a piece of primary research, a big question for a masters dissertation. But rephrased as How has consciousness been defined and those definitions operationalized in research? may be possible, once clarified by undertaking a critical review of the literature. Questions need to be the right size in terms of allowing a research design to investigate the problems they pose. Hence the Russian doll principle; larger questions need smaller ones and these need to fit together into a logical set. Your questions will now need to be developed by looking to see what specific objectives will be needed to actualize each question and how each concept is to be defined to identify its major variables. Figure 3.6 shows the main elements you will need to work on to construct a clear and coherent definition of your research topic. Once you have your research questions, which of these you work on next is in practice not important. You will find that alterations to one mean you revisit another in an iterative process of going around tweaking one then another. DEFINING CONCEPTS Concepts are words such as effectiveness, efficiency, performance, poverty, truth, impact and community. Due to the nature of language and the ways in which meanings are a product of a words use, concepts cannot be assumed to have a universal definition. When used in a research question, the way in which they are to be used needs to be defined. The literature on the topic is usually a good source of

Chapter-03.qxd

11/12/2004

2:52 PM

Page 82

82

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

Objectives Research questions Concepts Statement of purpose (aims) Hypothesis Definitions

Variables

FIGURE 3.6

ELEMENTS IN DEFINING YOUR RESEARCH TOPIC

candidate definitions for this. Poverty, for example, can be defined in absolute (a person is in poverty because they have nothing) or relative terms (a person is in poverty because they lack what others take for granted) and even within these two general categories there are more specific definitions and arguments over whether such definitions are useful in measuring the concept. You can use the literature to examine and interrogate definitions used previously, categorizing these into what kind of definition they are. For example, you can look at definitions by example, by genus and differentia, by stipulation and operational analysis. Whatever approach you use, remember that by defining your concepts you are entering into a research design that assumes it is possible to have a correspondence between words and things through the mediation of definition. We look at correspondence theory in much more detail in Chapter 7. But briefly, what this means is that you can measure the concept by defining variables assumed to correspond to the phenomenon. Poverty, for example, may be defined in terms of a range of indicators which state what a person does not possess (material things, social attributes, cultural capital and so on). The variables could then be defined in terms of such things as income level and value of assets, which would then be used to set a poverty line for the definition of poverty.

STATING THE AIMS AND PURPOSE OF YOUR RESEARCH Another way of seeing the links between the different elements in defining your research project is shown in Figure 3.7. It provides an overview of where the problem

Chapter-03.qxd

11/12/2004

2:52 PM

Page 83

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

83

statement (aims) fits into the process of research design (we look in more detail at research design in Chapter 10). The process is, as we have indicated, not as clear-cut as shown in diagrams such as Figure 3.7. The process is largely iterative in that you will find yourself moving back and forth between writing aims and objectives, then recasting your problem statement and reading further into the literature on the methodological tradition and approach you have elected to base your research upon.

Problem identification and analysis, listing of main concepts, theories and possible variables based on the literature and hunches

Topic ideas and topic analysis

Initial research questions and puzzles

Research questions ConceptsDefinitionsVariables

Research hypotheses

Research propositions
Problem formulation within the context of methodological traditions, identification of an appropriate methodological approach to write aims which state the purpose and type of research required for the study of the problem

Formulation of research aims

Formulation of research objectives

Purpose of the research

Type of research

Methodological tradition

Methodological approach

Specification of data requirements and tools suited to statement of the problem, methodological tradition and approach

Data kinds and data sources

Data collection tools and instruments

FIGURE 3.7

THE PLACE OF RESEARCH AIMS AND STATEMENTS IN RESEARCH DESIGN

Chapter-03.qxd

11/12/2004

2:52 PM

Page 84

84

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

Many researchers often experience some level of anxiety and frustration when writing the aims and objectives for a research proposal. This may be because there is no consensus as to what aims are, what objectives are and how they relate to each other. There is also the problem that different institutions have different ideas about what an aim is and what an objective is, sometimes using different words for the same thing, such as goal in place of aim. In this section we offer a guide to help you understand the nature of research aims and objectives that will help you to formulate good aims and clear objectives. We can begin by looking at the purpose of aims. A research aim is one or more statements used to express the general intent (purpose) and indication of the orientation (methodological nature) you have decided on for your research project. Your aim should also include a gloss of the topic, for example, motivation, and a broad typification of your units of analysis, for example, masters students. By intent we mean the purpose (function) you are proposing for your research, for example, to evaluate, find a solution, identify something or bring about a change to a situation by your research. One way of thinking about this is to say, This research intends to [examine], [explore], [inquire into], [investigate] or [study] in order to [identify], [diagnose], [answer], [find out], or [understand] ... some topic. By orientation we mean the position you have elected to take regarding the nature of your research, for example, to base your research on a quantitative or qualitative description, analysis or experiment. It may be that you state the orientation of your research before your intent. For example, your aims may begin like the sample shown in Figure 3.8. Although the aim shown in Figure 3.8 has the outcome to change the curriculum, this is not its primary intention. As a masters dissertation, the main intention is to demonstrate the ability to do research rather than effect change to a situation. Developing the curriculum to improve motivation is therefore a secondary consideration to recognizing that this is a proposal for a piece of formative evaluation that is focused on the questions, What do the students talk about as motivating them? and How can we use this information to develop a curriculum that motivates? Often such kinds of questions take the place of or complement a hypothesis. If we look at the words and phrases which make up aims we can see that the intent and orientation of an aim are not mutually exclusive. In the example already introduced and shown again in Figure 3.9 we can see that there are a number of phrases

Chapter-03.qxd

11/12/2004

2:52 PM

Page 85

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

85

primary research intent This research aims orientation data

to undertake qualitative examination of coffee-bar talk amongst masters students to identify factors which motivate and de-motivate to develop a curriculum that motivates research outcome: change second research intent: to evaluate, and second orientation: to understand

FIGURE 3.8

STRUCTURE OF STATEMENT OF RESEARCH AIMS

what kind of qualitative research?

covertly collected and therefore naturalistic?

to undertake qualitative examination of coffee-bar talk amongst masters students what sample size, demographic factors? on what course, studying what subjects?

FIGURE 3.9

DELIBERATELY VAGUE PHRASES ALLOW FOR FURTHER EXPLANATION AT A LATER STAGE

which are intentionally vague because they can be explained in other parts of the proposal. While the implicit references in an aim can be developed in other sections of the proposal, we can see in Table 3.4 the implications of using a specific word to indicate the orientation of the research. In the example the word examine has been used and this implies that the research will be looking at coffee-bar conversations in detail to analyse them and identify from them talk about the elements of the course which motivate and de-motivate.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 86

86

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

TABLE 3.4 VOCABULARY OF TYPES OF RESEARCH Term Investigate Amplification to inquire into thoroughly examine systematically a process of finding out search for evidence seek information by questioning seeking answers make an investigation look at or actively observe inspect carefully for detail scrutinize seek the unknown diagnose a problem discover cause and effect identify independent and dependent variables carefully consider critically think about seek to understand contemplate or reflect about

Enquire

Examine

Explore

Explain

Study

Your research aims can sometimes be used to state the purpose of your research; to provide the purpose statement. Here is an example: The purpose of this study is to identify which demographic factors (age, sex, ethnicity) correlate with which social lifestyle factors (social networks, number of sexual partners, employment, education, residence) to determine risk factors in young adult injection drug users (IDUs) currently or recently in rehabilitation. This is the kind of statement that can also be your aim and form the main part of your problem statement. Fully to be a problem statement you would need to state the known prevalence of the problem, why it is a problem, and for whom. This would be a very brief synopsis of research and data from the literature.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 87

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

87

TABLE 3.5 USING RESEARCH TERMS TO DESIGN COHERENT RESEARCH Using these identify and gather data which is subjected to and may result in which can be used to Examine Explore Inquire Investigate Study

Data

Information

Facts

Answers

Principles

Scrutiny

Criticism

Contemplation Comparison Evaluation

Under standing Draw conclusions

Relation ships Suggest solutions

Explanation

Knowledge

Recommend actions

Make changes

Clarify debates

The words in Table 3.5 can be used in a number of ways depending on what you intend to do. You can use them singly or combine them. For example, you could state that your aim was to do an exploratory study, or critical study, or investigative inquiry. You can also preface the orientation with a methodological one, such as quantitative examination. There are many more words and phrases that can be used to formulate aims which embody the methodology of your research, including evaluate, identify, experiment, analyse, describe and so on. As you can see from the matrix shown in Table 3.5, the vocabulary of research has different levels and dimensions. The point to remember is to ensure that whatever terms you use they should be logically related and used to formulate a coherent aim and set of objectives. We will now look at what we mean by aims being coherent. We can see in Table 3.5 an example of how the different elements can be combined in different ways to achieve a desired outcome. There are two main points to note here. The first is that not all research has to result in a solution; understanding and the clarification of issues are as valid as any other outcomes for a research project. The second is that starting with a study of, say an academic debate, then contemplating the nature and origins of that debate, then scrutinizing any data and arguments used in the debate, then subjecting data and argument to critical evaluation and reflection may result in a new (possibly) clearer understanding of the debate. The research will, however, have been

Chapter-03.qxd

11/12/2004

2:52 PM

Page 88

88

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

coherent in that its elements were deliberately chosen from amongst alternatives and logically combined in such a way as to be fit for the purpose of the research. It is this vocabulary that can be used to formulate the aims of a research project. In your aims you are stating the choices you have made and which you are proposing will form the basis of your approach to researching your topic. The aims you write in the early stages of your research will, of course, be subject to change as you refine the purpose of your research and the methodological tradition and approach you want to use. Finally, there are other pieces of information you can also include in your research aims, such as scope, dates (for example, between 18151883), the title of a publication (for example, Great Expectations), name of a person (for example, Charles Dickens), reference to a theory or position (for example, atavism), and an analytical framework, such as case study or comparative study, a hypothesis and your main research question.

WRITING OBJECTIVES FOR YOUR RESEARCH The objectives of a research project (a proposal for your research) are the tasks required to actualize adequately the main elements of the research questions. There is sometimes, as we have said, some variation over what some people call aims and objectives. Objectives tend to be defined as the tasks you will need to do, in the rough order, to complete your research. Most research projects will need a search and review of the literature, construction and testing of data collection instruments, analysis of the data and a research report. Taking these as the major parts usually required, one way of casting your objectives is to look at your research questions and identify what tasks need to be done in terms of the dissertation structure in order to answer them. This way may result in the following set of objectives:
1 To review the literature of public library use by students of basic adult education courses in order to identify which variables have been previously identified in terms of low-use patterns. 2 To interview a sample of students about their use and knowledge of what their local public library can provide related to their course and their patterns and reasons of use of the library service. 3 To survey a sample of adult education providers to find out what they know about what public libraries can provide for their students and what they know about their students use of the library service.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 89

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

89

4 To identify gaps in knowledge of what the public library can provide for students on basic adult education courses. 5 To make realistic recommendations on how libraries can make their resources known to providers and students of basic education courses and how they can mitigate some of the barriers to the use of those services for this group of people.

These objectives are from a study of public library use by students on basic adult education courses. They are numbered consecutively and give only the briefest of information on what will be done and what information will be the result. The main focus of the research question they are based on is why dont students on basic adult education courses make more use of the resources in public libraries to help themselves. Although the number of objectives do not always have to correspond to the same number of research questions, for both between five and seven are usually regarded as sufficient to express what you want to know and how you will go about finding out. The second main approach to objectives is to express them as outcomes; as products of different parts of your research. The following are from a study of information flow in the construction industry; an industry subject to many different statutory regulations and standards which are constantly changing. The example shows the main aims, problem statement and objectives.

The aims of this study are to identify the ways in which quantity surveyors in the UK construction industry obtain and use information and to evaluate the role of special libraries in supplying relevant information. The major problems facing quantity surveyors are the amount of information necessary in the form of regulations, standards and specifications, changes to the information and application to different kinds of construction. To investigate information flow it will be necessary to: 1 Detail the flow of information in terms of its supply and availability to its use by a sample of quantity surveyors. 2 Examine previous research on information flow in the construction industry, identifying its function and cases of failure. 3 Describe and evaluate the role of special construction libraries in the information chain.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 90

90

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

4 Survey quantity surveyors on their knowledge and use of special libraries. 5 Compare the knowledge quantity surveyors have and their use of special libraries with other sources of information. 6 Suggest ways in which special construction libraries can be more effective in supplying information to quantity surveyors. (Shoolberd, 2003: unpublished teaching notes) After each objective it is legitimate practice to provide some explanation of what you are intending to achieve. This will help you to understand what will be involved in each objective and how they relate to each other and to your aims.

USING A HYPOTHESIS IN YOUR RESEARCH Sometimes your research questions or the expectations of your supervisor may mean that you need to develop a hypothesis for your research. A hypothesis is an informed guess or hunch that a relationship may exist between two variables with one being the cause of the other. A hypothesis (H1) is therefore a statement that asserts that a relationship exists between two or more variables, that x is caused by y, or that particular consequences (C) will follow if the hypothesis is valid, that if H1 then C1, C2, C3 and so on. For example, I know a little about motorcars and how they work and hence, sometimes, why they do not work. If I turn the key to start mine and nothing happens I can, on the basis of my existing knowledge, hypothesize a number of possible causes, but that the most likely is a flat battery. Stated as a hypothesis to be tested, this could be: cars with a flat battery will not start. As a consequence, if it is a flat battery then I also know that there will be a number of direct consequences, such as the radio will not work (and will have lost its memory of my favourite pre-set stations), the clock, being electric, will have stopped and the windows will not work. I could, if asked, provide more detail on why a flat battery causes the situation by taking the explanation to another level, say motorcar electrics. But I could not go much beyond this because I do not know enough about physics to talk about how a battery works at the level of atoms and electrons. There are, then, different levels of detail at which hypotheses can be used to give different possible explanations which have different levels of explanation. These differences are what Alan Garfinkel (1981) calls explanatory relativity. This point is that a hypothesis should be appropriate to the level of detail required and that we remember it is not, as in our example talking about electrons, explaining the phenomenon but something about the consequences of the

Chapter-03.qxd

11/12/2004

2:52 PM

Page 91

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

91

phenomenon. We are asking why will the motorcar not start, not how does a car battery work. The use of a hypothesis in research is more complex than in this example, but it illustrates the main principles of hypotheses such as:

they are tentative propositions based on existing knowledge (even a theory) and its use to explain a situation;

they are limited to the situation at hand, but the knowledge they are based on is general;

the validity of the hypothesis in this situation is not known, but contains the details of what variables are to be investigated to test the validity of the hypothesis; and

if found to be the cause from which the consequences have logically followed, this is the evidence for confirming the hypothesis.

Hypotheses therefore give direction to the investigation in terms of where to look, what to look at, what to test and as such have a deductive structure. This means that they can be expressed in terms of if , then. Figure 3.10 shows the deductive structure along with the role of inductive inference. In our motorcar example the hypothesis we Analytical statistics, especially used is called a research hypothesis because the the Pearson-Product-moment problem it addresses is capable of being empiricorrelation, plays a large part in cally investigated. Given the consequences, that the calculation of the data for electrical devices in the motorcar do not work, hypothesis testing. For help with then our hypothesis is, on the basis of prior statistics, see Further Reading to experiences, the most statistically probable. this chapter. Hypotheses work well with physical events (or lack of ) because they can be based on existing knowledge of the basic laws of physics. Sometimes, however, in physics, but more often with human actions, events are the outcome of chance. The chance of 50 per cent of millionaires owning a Rolls Royce is 50 per cent may or may not be statistically correct. It is measurable and if found to be the case only tells us there is a 50/50 chance of millionaire y owing a Rolls Royce motorcar x. Similarly, if we say that there will be no difference between the reading habits of an equal sample of left-handed boys and left-hand girls aged 13 years, we are saying there is no relationship between reading

Chapter-03.qxd

11/12/2004

2:52 PM

Page 92

92

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

Hypothesis H1

Consequences C1, C2, C3

Hypothesis confirmed

Tests and experiments

Inferences about the facts allows inductive reason to make links to confirm something we have seen before

Facts and evidence

If the consequences cannot be shown to follow from the proposed cause, an amended or alternative hypothesis is needed

Inferences may also suggest further hypotheses for phenomena we are unable to measure

Hypothesis (H1) is falsified

FIGURE 3.10

THE DEDUCTIVE STRUCTURE OF HYPOTHESES

habits and left-handedness. This type of statement is called the null hypothesis because it states there will be no difference statistically between the variables. We could measure the reading habits of the boys and girls and calculate the variance between the two sample groups, which would indicate (rather than strictly prove) whether the null hypothesis is acceptable or is to be rejected for an alternative research hypothesis. Note that we are using samples with the intent to generalize to a larger population and therefore need to know much more about sample selection techniques and the nature of generalization to use hypotheses. These are all parts of the research design which we will look at later in this book. You should, as a mater of course, be thinking about samples and also about the elements of your hypothesis and research questions at this stage. This mainly involves looking to see how you can define your major concept (sometimes called constructs) and what indicators, variables and values you will use to operationalize it. For example, if you were looking at poverty and ill-health you may hypothesize that poverty is a major cause of poor health and mortality among low income families. Poverty, poor health and low income would all need careful consideration and recourse to the literature for definition, but for the sake of our example an initial design might look like the one shown in Table 3.6. Outlines such as the one shown in Table 3.6 can be useful starting points for all types of research, not just those using a hypothesis. They help to clarify what kind of data will be needed in terms of their relevance, amount and detail and how they may be collected so as to be reliable and able to be compared.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 93

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

93

TABLE 3.6 OPERATIONALIZING THE HYPOTHESIS Main concept cannot be directly measured Category of persons (or phenomena) which indicates the existence of the concept Categories of activities (or things) which can be measured The actual units (how much, how often) that can be measured Poverty

Indicators

Registrar Generals classification of social class

Variables

Diet, smoking, alcohol consumption

Value

Lung-cancer, heart disease, bronchitis rates

RESEARCH PROPOSITIONS While hypotheses are usually associated with correlational and explanatory research, it is quite possible to use a form of hypothesis in other types of research and research approaches. For the sake of clarity and demarcation we will term these propositions rather than hypotheses. A proposition is a phenomenon presented for consideration that wants to confirm or deny assumptions, methodology or methods used to define or apply the phenomenon. For example, we may propose that newly constructed university library buildings (say five) meet all the current building regulations but fail to meet the needs of students. We are proposing that there is a logical gap in the function of the building and could go on to propose why we believe this to be the case. Propositions are statements based on an argument which can be investigated through a similar research design to that shown in Table 3.6. Our example here may include definitions of a library, usage statistics of before (old library) and after (new library) to indicate usefulness as a concept, and questionnaire survey and interviews with users. This proposition could include the collection of a range of organizational statistics, quantitative responses and qualitative opinions. It would not result in any kind of strict correlation between the variables, but would fulfil the main purpose of raising a topic for critical discussion. My own research on the influences on library architecture uses such propositions combined with research questions (Hart, 1996). For example: What are the main conceptual influences on contemporary library design? How do these relate to the historic place and value of knowledge? How is the purpose of the library represented in its design? What role do librarians and users of libraries have in the design of libraries? Questions like

Chapter-03.qxd

11/12/2004

2:52 PM

Page 94

94

/ E S S E N T I A L P R E PA R A T I O N F O R Y O U R D I S S E R T A T I O N

these can form the basis of a propositional argument that has several related propositions, such as: contemporary library architecture represents information access rather than knowledge collection; they are designed using the concept of visibility, access and speed; the book is no longer valued because it is seen to represent elitism; hence the glass library building has replaced the stone one and computers have replaced books. This was investigated using images of recently built libraries.

SUMMARY OF THIS CHAPTER This chapter has attempted to provide you with an overview of the initial stage of doing your masters dissertation. The focus has been on the general issues and techniques for finding a suitable topic for your research and how to define a topic in terms of questions which are research questions. These ways of defining a topic have only been touched on and you are advised to consult the literature, especially in the further readings to this chapter, and your tutors for detailed advice. But now that you know about defining a topic, some time needs to be given to considering methodological traditions and approaches before the research project is finally formulated into a definite design. These are issues that will be dealt with in Chapters 10 and 11. The key points made in this chapter include the following:

The topic needs to be do-able in the time you have available. A do-able topic is one that has available data you can access and have the time to analyse. There are many different ways of framing a topic and most of these are as puzzles to be solved and the initial or indicative search and review of the literature is an important part of topic analysis. The earlier you start looking for a topic, the more time you will have to develop a clear puzzle and research design. Once you have some candidate topics, define them using research questions, hypotheses and propositions.

Chapter-03.qxd

11/12/2004

2:52 PM

Page 95

F I N D I N G A N D F O R M U L AT I N G Y O U R T O P I C /

95

Further reading
Alasuutari, P . (1995) Researching Culture: Qualitative Method and Cultural Studies. London: Sage. Chapter 11 introduces the idea of research being about unriddling. Blaxter, L., Hughes, C. and Tight, M. (1996) How to Research. Buckingham: Open University Press. A good starting point with some simple to do exercises on topics. Booth, W.C., Colomb, G.G. and Williamson, J.M. (1995) The Craft of Research. Chicago: University of Chicago Press. Has a section in Chapter 3 on moving from topics to research questions. Clarke, G.M. (1992) A Basic Course in Statistics. 3rd edn. London: Edward Arnold. A solid introduction to statistical techniques relevant to hypothesis testing. Dalen, Van, D.B. (1979) Understanding Educational Research: An Introduction. New York: McGraw-Hill. A thorough introduction to hypotheses and related statistical techniques. Dees, R. (1997) Starting Research: An Introduction to Academic Research and Dissertation Writing. New York: Pinter. See Chapter 3 on planning a focus for your research. Kumar, R. (1999) Research Methodology: A Step-by-Step Guide for Beginners. London: Sage. Chapter 4 gives advice on formulating a research topic including using hypotheses and Chapter 5 on variables. Lester, J.D. (1993) Writing Research Papers: A Complete Guide. New York: HarperCollins. Has advice in Chapter 1 on finding a topic. Silverman, D. (2000) Doing Qualitative Research: A Practical Handbook. London: Sage. See Chapter 5 Selecting a topic for advice on strategies to overcome some of the most common errors when looking for a topic and at qualitative hypotheses. Trochim, B. (2002) Research Methods Knowledge Base. Address: http://trochim.human.cornell.edu/ An Internet resource that includes a lot of advice on hypotheses, samples and statistics. Walliman, N. (2001) Your Research Project: A Step-by-Step Guide for the First-time Researcher. London: Sage. Chapter 5 discusses hypotheses, research questions and propositions.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 164

8
Aims

How to succeed in group work

To consider the role of group work in the academic environment and to focus on developing group work skills.

Learning outcomes
It is intended that after reading this chapter and engaging in the activities set you will have:
I developed an awareness of the role and potential of group work in the

academic environment
I engaged in activities designed to develop your group work skills.

Introduction
Group work can be one of the most emotionally charged areas of a students life. Many students see only the problems associated with working in a group. Perhaps because they never get heard in a group, perhaps because they usually do all the work whatever the problem, group work makes them unhappy. If you have doubts about group work or if you just want to get the most out of it, then this section is for you. Colleges and universities are increasingly building group activities into their programmes. Some because they have an ideological commitment to collaborative learning they believe that we are inter-dependent beings and should recognise and build on that. Others feel that group work offers support to their students tasks are easier when they are shared. Still others feel that they are pragmatically preparing students for the world of work for if you cannot work with other people, you are unlikely to be able to keep a job.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 165

8: How to succeed in group work 165

Whatever your college or universitys reasons for asking you to engage in group work, we recommend that you try to get the most from it and to help you, we are going to explore the what, why and how of group work. In order to do that we have drawn very heavily on the group work theories of Business for it is in Business that group work is particularly valued.

Group work made simple: the pyramid discussion


A simple group starting activity is the pyramid discussion.When asked to start a group project: I think about the topic on your own I discuss ideas in pairs I build arguments in fours I feed back your whole group thinking in a plenary.

What is group work?


A group has to have a membership of two or more people. Typically there should be a sense of shared identity that is, you should all have a sense that you are a group and that you have shared or common goals. Further within the group there should be a feeling of interaction and interdependence a sense that you can achieve something together. Perhaps it is in these initial definitions of a group that we have hit upon some of the problems with academic groups. How many people in an academic group do feel that sense of identity and inter-dependence? How many embrace the task and the sense of shared goals? If this sounds like groups that you have been in or that you are in now what are you going to do to make your group feel and operate like a group?

Students in groups
In school, college or university there can be many forms of group work. For one thing, and as we keep reminding you, you can sort out a study partner or a study group to make study more collaborative and supportive. You can build your own groups to share the reading for assignments, to discuss assignment questions to proof read your work (as the tutors did in Chapter 7).

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 166

166 Essential Study Skills: the complete guide to success at university

There can also be group activities organised by the tutor from class discussions to the formal, assessed group project. Typically the best thing you can do is participate in these as positively as possible.

Academic groups
I Class discussion or activity based around something that has come up in a class. I Tutorials two or more students meeting with a tutor and working together on a topic or task. I Seminars a group of students meeting with a tutor.Typically working together on a topic covered in a lecture programme. I Group assignments where students are asked to produce something collectively. For example you may have to prepare a presentation or a seminar, or write and produce a report, magazine or a video. Perhaps you will be awarded a collective grade.

G Tips: G Group work is typically designed to reduce the workload whilst increasing the amount of active and interactive learning that takes place take advantage of this. G Sometimes the process, as well as the product, is assessed. Here, students will be asked to reflect on the whole group work process.That is, how you worked as a group roles that were adopted problems that occurred and how they were solved. Make notes as you go along!

Why groups?
Group work offers many advantages to students of all ages yes, really! For one thing it offers an opportunity to share the workload. It really is easier to do all the reading for a module if you share it out. Further, group work at its best fosters active learning. You are expected to discuss things in a group that reading for instance; in this way, everyone in the group will learn more than if they had just done something on their own. Not only this, you also get to refine your personal and inter-personal skills if you learn to discuss ideas and negotiate strategies with tact and diplomacy in your group: you do have to be assertive rather than aggressive in a group. Another advantage of group work is that a good group offers social support that can break down some of the isolation sometimes associated with being a student.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 167

8: How to succeed in group work 167

Disadvantages
Of course there can be disadvantages to group work. One disadvantage is linked to the fact that many group activities are now assessed. Students have become increasingly aware of the importance of good grades. Thus they are incredibly resentful of those in the group who do not pull their weight, who do not stay on track, who dominate or bully or distract. There can also be groups where members stay silent or groups where the same people always speak. None of this feels satisfactory and it causes much resentment.

Resolving conflict
But every disadvantage can become an advantage if you work out how to resolve the problems that you encounter. So notice what is happening in your groups. Notice how difficult situations are resolved. Notice how unmotivated people are encouraged to give of their best And put these notes in your curriculum vitae file. When you apply for a job you will be able to prove that you are good at group work by giving examples from your time at college or university. It is the examples that you give and the way that you discuss them that will make all the difference in that vital job interview! And this does refer to another why of group work it can and does prepare you for your future employment.

How to do group work


The best way to get the most from group work is to approach it positively, determined to get the most from it. If you really dislike group work, but have to engage in it fake it to make it: role play being an active, positive student. Another simple and very effective strategy is to choose your groups with care. Do not just team up with those people sitting next to you or those nice chatty people from the canteen! Group tasks normally involve hard work: choose people who are as motivated, positive and industrious as you when you are choosing your group.

SWOT your group work


SWOT stands for Strengths,Weaknesses, Opportunities and Threats.
I What are your group work strengths? I What are your weaknesses? I What opportunities are there for you in group work? I What threats?

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 168

168 Essential Study Skills: the complete guide to success at university

Once you have answered these questions, think about your answers discuss them with your study partner:
I What do they tell you about yourself? I What do they tell you about how you should approach group work?

A business-like approach to group work


Management theorists like Belbin and Adair (see boxes) have worked to de-mystify group work forms and processes so that businesses can run more effectively. Have a look at the information in the boxes and see what they tell you about how to make groups work. As always ask yourself, How will knowing this make me a more successful student? For, in the end you must work out how knowing those things will help you to succeed in the group activities that you will be expected to undertake at college or university. G Tip: If you are expected to reflect on your group work experiences using the following information will definitely improve your grade!

Belbins group roles


There are eight key roles that management experts like Belbin (1981) have identified in group activities.We have listed these below indicating the possible strengths and weaknesses involved: I Company worker a dutiful, organised person, who may tend to inflexibility I Chair a calm, open minded person, who may not shine creatively I Shaper a dynamic person who may be impatient I Creative thinker one who may come up with brilliant ideas, though these may be unrealistic I Resource investigator an extrovert character who may respond well to the challenge but who may lose interest I Monitor a sober, hard-headed individual who keeps everything on track, but who may lack inspiration I Team worker a mild social person with plenty of team spirit may be indecisive I Completer/Finisher conscientious, a perfectionist, maybe a worrier.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 169

8: How to succeed in group work 169

Lets pause here to go over the list. I Which description most fits you? I Are you happy with this? I What are you going to do about it?

G Tips G Experiment with group work.Adopt different roles in different academic groups. Each time you vary your role in a group you will develop different aspects of your personality; this is a good thing. G Decide to use your group work experiences to develop your c.v. and get you that job. So as you move through team worker, leader, information gatherer, creative thinker, completer, etc. make notes on your experiences for your c.v. folder. G Whilst eight roles are indicated here, research indicates that academic groups work best if they only contain four or five people any more and you start to get passengers. G In a small group, allocate roles wisely, but make sure that you have a chairperson and that everyone does know what the task is, what they are doing and when it all has to be completed.

Adairs processes
As there is theory as to the roles adopted in group situations, so there are arguments as to the processes that groups go through. Adair argues that groups have distinct forms, or pass through distinct transformations, as they encounter the task, settle down to it and finally pull it off.These are known as forming, storming, norming and performing some people also speak of a fifth stage, mourning. Forming is where the group comes together and takes shape.This forming period is a time of high anxiety as people work out: I who is in the group and what they are like I what the assignment is what it involves I what the rules are about behaviour, about the task, about assessment I what they will have to do to get the job done and who will be doing all the work. Storming is where conflict arises as people sort out all the confusions highlighted above.This is where people seek to assert their authorities, and get challenged.Typically this is a black and white phase everything seems all

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 170

170 Essential Study Skills: the complete guide to success at university

good or all bad: compromise is not seen. At this stage people are reacting emotionally against everything as they challenge: I each other I the value of the task I the feasibility of the task (you cannot be serious!).

G Tip: If you do not like group work, ask yourself, is it because you do not like conflict? Perhaps you just find this phase uncomfortable? If this is so, remind yourself that this phase passes.
Norming, as the name suggests, is where the group begins to settle down. Here that sense of inter-dependence develops as: I plans are made I standards are laid down I co-operation begins I people are able to communicate their feelings more positively. Performing is where the group gets on and does what it was asked to do. It is now that the task can be undertaken and completed and success can be experienced! Here it is useful if: I roles are accepted and understood I deadlines are set and kept to I communication is facilitated by good inter-personal skills.

G Tip: Share your phone numbers and your e-mail addresses. Do have a group leader who will take responsibility for chivvying people along. Do set people tasks that they can do.
Mourning:The fifth stage, mourning, is supposed to follow a successful and intense group experience. As you work hard to complete an assignment with people, you develop links and bonds.Typically you enjoy the sense of mutual support and commitment.The feeling of inter-dependence is very satisfying. When all this ends as the task ends, there can be a real sense of loss.

G Tips: G Be prepared for the sense of loss. G Work to keep in contact with good team players you may be able to work with them again.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 171

8: How to succeed in group work 171

Queries: I Do you recognise any of these stages? I Now that you know about them think how you might use this knowledge to your advantage. I How will you draw on this information in your next group activity? I Make notes so that you do not forget.

And of course it can be like any other assignment!


There are many things that good groups have to do to work well, that is, for everyone involved to have a good time and for the task to be accomplished: we have referred to some of them above. Remember also to treat group work like any other assignment (see Chapter 9 how to prepare better assignments).

How to succeed in group work


It still helps to: 1 Prepare to research: open your research folders, analyse the task, making sure that you know exactly what you have been asked to do or make essay, report, presentations, seminar, etc. Then:
I I I I

analyse the question all of it have the overview and fit the task to the module learning outcomes use creative brainstorming and notemaking strategies action plan work out who is doing what, why, where and when!

2 Follow the action plan undertake targeted research and active reading. 3 Review your findings. 4 Plan the outline of the report, seminar, presentation or whatever. 5 Prepare the first draft. 6 Leave a time lag. 7 Review, revise and edit agree on a final draft. 8 Proof read or rehearse if it involves a group presentation. 9 Hand work in on or before a deadline. 10 Review your progress!

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 172

172 Essential Study Skills: the complete guide to success at university

G Tip: If your group work review forms part of the formal assessment: Ask your tutor exactly what it is that they are assessing before you even start the group activity. In this way you can note the relevant information as it arises and have it there ready for when you perform your formal review of your group project.

Conclusion
We have used this section of the book to explore group work in the academic setting. We have stressed that group work can be a positive, supportive and interactive learning experience especially if you tackle group activities with enthusiasm and commitment and with the co-operation of committed group members. At the same time we stressed that you can benefit even from problem groups by noting how your problems were overcome and that you use such reflections in a formal group review and in your job applications. We stressed how an awareness of group roles and processes can help you understand and succeed in your group activities. Finally we compared success in group assignments with success in any assignment making links with the ten-step plan, prepare and review strategy introduced in Chapter 9. Good luck with your group activities. Enjoy your group work groups really can be supportive, exciting and productive.

Group building activities


There are management team building games that you might like to experiment with to develop your group work skills and for the fun of it.We have included one below; you can search out others if you wish. The Paper Tower In this activity you will need to gather together some students who want to develop their group work skills and some simple resources. The goal will be for groups to construct a paper tower with a given supply of resources.Variations on this include: designing, producing and testing a nonbreakable egg container or balancing a spoon on a paper tower.The egg container is the more dramatic! Aim To develop group work skills through practical activity, observation and feedback.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 173

8: How to succeed in group work 173

Learning outcomes By the end of this activity participants will have developed: I a sense of the social support offered by group work I an idea of their own approach to group work I a sense of the fun of group work I an idea of the positive benefits of undertaking tasks in a team rather than alone I some strategies for successful group participation. Resources Large quantities of newspaper, cellotape, paper clips and rubber bands sufficient for all participants. The Paper Tower Exercise 1 Divide participants into groups of 5-6 people. Each group has to choose an observer who will not participate but who will note how the other people do so. The participants have to build a tower with the resources to hand. Each group will present their tower to the other groups. Each observer will feed back how his or her group performed. (Allow 20-30 minutes tower building time.) 2 Whilst the students build their towers the observer makes notes as to the roles adopted by individual members or the processes engaged in by the group.The observer notes how people engage in the group task. 3 Groups report back on the criteria they had chosen for their tower, the tower itself and how they felt the group performed.The observer feeds back (in constructive terms) on the roles and/or processes of the group. 4 Plenary: hold a plenary to discuss what the participants have learned from the activity and how they will draw on this in the future. Review points When reviewing this activity participants might note that they: I enjoyed it it was fun I benefited from being part of a team I have some idea of how they performed in a group activity I have learned something useful about group work that they will build on in the future.

8603 Chapter 8 (164-174)

11/11/02

2:33 pm

Page 174

174 Essential Study Skills: the complete guide to success at university

Review points
When thinking about what you have read and the activities that you have engaged in, you might feel that you have: I developed an awareness of the forms and processes of group work so that you are in a position to make the most of group activities in the future I developed an awareness of the potential of group work in the academic environment I developed an awareness of how to use your group work experiences at college or university to improve your job applications.

Further reading
If you are interested in this topic you may wish to have a look at the following: Adair, J. (1983) Effective leadership, (1987) Effective team building, (1987) Not bosses but leaders Belbin (1981) Managing teams: why they succeed or fail, Heinemann: London

Chs-12.qxd

5/25/2004

11:59 AM

Page 145

12
12.1 12.2 12.3 12.4 12.5

Whats All This About Ethics?

CHAPTER CONTENTS
ACKNOWLEDGING OTHER PEOPLES WORK RESPECT FOR OTHER PEOPLE SCIENTIFIC HONESTY AND SUBJECTIVITY WHAT SHOULD I DO NOW? REFERENCES TO MORE INFORMATION 145 149 153 154 155

Ethics is about moral principles and rules of conduct. What have these got to do with writing a dissertation? Quite a lot actually; they focus on your behaviour towards other people and their work. You are not producing your dissertation in a vacuum. You definitely will be basing your information and ideas on work done by other people, and you may well be interacting with other people in a more personal way during your study. It is therefore important to avoid unfairly usurping other peoples work and knowledge, invading their privacy or hurting their feelings.

12.1

Acknowledging other peoples work

An important part of a dissertation study is to find out what has already been written by other people on the chosen subject. You will be expected to collect and report on facts and ideas from a wide range of sources, so there is no need to feel that everything you write has to be original. Even the greatest thinkers have stood on the shoulders of giants in order to make their discoveries. Jean Renoir (1952), the French film producer, expressed his views on this very strongly when he talked in a filmed interview about his 1930s film Une Partie de Campagne, which he based on a story by Guy de Maupassant:
Maupassants story offered me an ideal framework on which to embroider. This notion of using a framework begs the question of plagiarism something I whole-heartedly

Chs-12.qxd

5/25/2004

11:59 AM

Page 146

1 4 6

Y O U R

U N D E R G R A D U A T E

D I S S E R T A T I O N

approve of. To achieve a new Renaissance, the State should encourage plagiarism. Anyone guilty of plagiarism should be awarded the Legion of Honour! Im not joking: plagiarism served the worlds great writers as well. Shakespeare reworked stories from Italian authors, amongst others, Corneille took Le Cid from Guilln de Castro, Molire ransacked the classics; and all were right to do so.

This obviously has to be taken with a pinch of salt! There is, however, real truth in the view that other peoples work can be an inspiration and guide to ones own. The point is that the sources of work on which you base your writing must be acknowledged. Renoir made sure of this in the title of his film: Une Partie de Campagne de Guy de Maupassant. In order to maintain an honest approach, there must therefore be a clear distinction between your and other peoples ideas and writings.

FIGURE 12.1

Other peoples work can be an inspiration

Your university or college will have strict regulations covering the issues of plagiarism and syndication, and you should make yourself familiar with these. These extracts from the Oxford Brookes University Student Conduct Regulations provide a typical example:
Candidates must ensure that coursework submitted for assessment in fulfilment of course requirements is genuinely their own and is not plagiarised (borrowed, without specific acknowledgement, or stolen from other published or unpublished work). Quotations should be clearly identified and attributed, preferably by the use of one of the standard conventions for referencing. Assessed work should not be produced jointly unless the written instructions specify this. Such co-operation is cheating and any commonality of text is plagiarism.

Chs-12.qxd

5/25/2004

11:59 AM

Page 147

W H A T S

A L L

T H I S

A B O U T

E T H I C S ?

1 4 7

The penalties which a module/subject leader may impose for plagiarism are: 1 2 3 4 a formal written warning; or a reduction of marks for the piece of work; or no marks for the piece of work; or a fail grade for the module concerned.

You may think that you could easily get away with copying some chunks of text from the Internet; after all, there are millions of pages to choose from. However, the source can easily be tracked down by typing a string of four or five words from the text into a search engine like Google. The penalties for transgressing college regulations, even inadvertently, are heavy; so how can problems be avoided? The solution lies in a good system of referencing and acknowledgement. Credit will be given for evidence of wide reading of relevant texts, so there is no need to be shy of quoting your sources. There are two ways of incorporating the work of others into your text: the first is by direct quotation, and the second by paraphrase. These can be referenced in several widely recognized systems, for example the Harvard system. Generally, all systems identify the sources in an abbreviated form within the text to pinpoint the relevant sections, and cross-reference these to a full description in a list at the end of the chapter or dissertation, or in some cases in a footnote at the bottom of the page. You should decide on one system and then use it consistently. There might be advice given in your course description as to which system is preferred. For a full account of the practical aspects of how to do your referencing, refer to Chapter 17. How much referenced material should you use? This depends on the nature of your dissertation. Obviously, if you are making a commentary on someones writings, or comparing the published works of several people, it will be appropriate to have numerous references. In other cases, say a report on a fieldwork project, only a few may be sufficient to set out the background to the study. You may be able to get advice on this issue from your tutor. Where does the boundary lie between paraphrasing (which requires referencing) and your own writing based on the ideas of others (which does not)? This is a matter of judgement. Substitution of a few words, reordering sentences, or cutting out a sentence here and there, are not enough to make it your own work. A sound method of avoiding accusations of plagiarism is to carefully read the source material, and then put it away out of sight. Rely on your memory and own words to describe and interpret the ideas. Here is a brief example in two parts to demonstrate skills in paraphrasing, using a quotation from Leedys book Practical Research (1989). First the quotation a word-for-word copy of a section of his text. Note the citation at the end.

Chs-12.qxd

5/25/2004

11:59 AM

Page 148

1 4 8

Y O U R

U N D E R G R A D U A T E

D I S S E R T A T I O N

Any research endeavour that employs human subjects may raise questions of propriety, create misunderstandings, or ask subjects to go beyond demands consistent with pure research objectivity. A statement signed by the subject, indicating a willingness to co-operate in the research and acknowledging that the purpose and procedure of the research project have been explained, may well be a safeguard for both researcher and subject. Such a statement should contain a clause indicating that if, at any time during the research procedure, the individual should not wish to continue to be associated with the research effort, he or she shall have the right to withdraw. If this situation occurs, the subject should notify the researcher in a written memorandum, in which is set forth the specific reason or reasons for the decision to withdraw. (Leedy, 1989, p. 96)

In the first example I have made a summary in my own words of the main points, which are attributed to the author. I kept the text in front of me so that I could make an accurate account. The length was reduced to a couple of sentences, and again there is the citation.
Leedy (1989, p. 96) states that research using human subjects may raise issues of propriety, misunderstandings and objectivity. To mitigate problems, a signed statement should be obtained from the subject indicating agreement to participate in the project, and containing the option for him or her to opt out of the research exercise on production of a written explanation of the reasons for withdrawal.

In the following example I put the text aside and wrote a commentary in my own words on the content, i.e. my interpretation of the issues raised. The source does not need to be cited in this case.
Using a signed agreement between researcher and subjects will help to reduce any misunderstandings and misgivings on the part of participants in research projects. An opt-out clause should be included to enable participants to terminate the agreement during the course of the project.

If your dissertation were to be published, there are strict limitations as to how much direct quotation or illustrative material you are allowed to use without asking permission from the original author or copyright holder. For example, all poetry or song lyrics, as well as illustrations and figures, need permission, as does the quotation of more than about 400 words from a single prose work. However, for an unpublished student academic work like yours, these limits do not apply. Even more reason, then, to acknowledge your sources, in gratitude that you do not need to go through the process of gaining permissions!

Chs-12.qxd

5/25/2004

11:59 AM

Page 149

W H A T S

A L L

T H I S

A B O U T

E T H I C S ?

1 4 9

FIGURE 12.2

Getting information from other people

12.2

Respect for other people

Many dissertation subjects require the getting of information from people, whether they are experts or members of the general public. This data collection may be in the form of interviews or questionnaires, but could also be types of experiments. Whenever dealing with other people, you must be sensitive to issues of privacy, fairness, consent, safety, confidentiality of information, impartiality etc. This is actually quite a complex subject, and it requires real thought about how your plans for getting information or opinions from people can be carried out in a way that complies with all these ethical issues. Here are some of the main aspects to check.

INFORM PEOPLE
Participants have a right to know why you are asking them questions and to what use you will put the information that they give you. Explain briefly before interviewing and add an explanatory introduction to questionnaires. If you will be conducting some kind of test or experiment, you should explain what methods you will use.
Example You are stopping people in the street to ask them where they have walked

from and where they are going. Explain that you are conducting a college study to assess the pattern of pedestrian movements in the town centre.

Chs-12.qxd

5/25/2004

11:59 AM

Page 150

1 5 0

Y O U R

U N D E R G R A D U A T E

D I S S E R T A T I O N

ASK PERMISSION AND ALLOW REFUSAL TO PARTICIPATE


Do not assume that everyone is willing to help you in your research. Once they are informed about the project they should be clearly given the choice to take part or not. A more formal agreement like the one suggested in the previous section will be appropriate for extended projects or those of a sensitive or intimate nature.
Example You want to test peoples skills in balancing on a tightrope, depending on the

tension of the rope. You will need to explain exactly what you wish them to do, safety measures taken, clothing and footwear required, time and place of the experiment, who will be observing, and other data required (e.g. age, weight, size etc.) This will enable the possible participants to judge if they want to take part.

RESPECT PRIVACY THROUGH ANONYMITY


Most surveys rely on the collection of data, the sources of which do not need to be personally identified. In fact, people are far more likely to give honest replies to questions if they remain anonymous. You should check that the way data are collected and stored ensures anonymity omit names and addresses etc. Treat data as numbers wherever possible.
Example You are distributing a questionnaire to households about vandalism and

intimidation on a housing estate, asking questions about the levels and sources of the problems. To ensure anonymity, the questionnaires must not contain anything that may identify the respondent, e.g. even a family profile might do this. Delivery and collection of the questionnaires should also be considered to ensure that the information cannot get into the wrong hands.

ATTRIBUTION
If anonymity is not desired or even possible, e.g. when obtaining particular views of named influential people, the information collected must be accurately attributed to the source. Agreement must be obtained that the opinions/information given can be used in your dissertation.
Example You are interviewing a leader of a trade union organization and the manager

of a firm about an industrial dispute relating to a pension scheme. There must be no confusion in your account of the interviews about who said what. Ask before the interviews if you will be allowed to quote them in your dissertation.

Chs-12.qxd

5/25/2004

11:59 AM

Page 151

W H A T S

A L L

T H I S

A B O U T

E T H I C S ?

1 5 1

OBTAIN AUTHORIZATION
It is good practice to send a draft of the parts of your work containing the views or information given by named sources to those concerned, asking them to check that your statements are accurate and that they are allowed to be included in your dissertation.
Example In the above example, if the interviews are lengthy, and the opinions are

contentious in what is probably a sensitive situation, you will gain respect and cover yourself against problems if you get a signed copy of the drafts of your accounts of the individual interviews from the respective people. This is absolutely necessary if you quote people directly. If you say you will do this in advance, you will be likely to get a less cautious response during the interview, as there is an opportunity for the interviewee to check for accuracy.

FAIRNESS
In any tests or experiments, thought should be given to ensure that they are fair, and can be seen to be so. Participants will feel cheated if they feel that they are not treated equally or are put at some kind of disadvantage.
Example You have devised a simple test to gauge peoples manual dexterity on

equipment that can only be used by the right hand. Left-handed people will feel justifiably disadvantaged.

AVOID SEXISM
The way language is used can often lead to sexism, particularly the use of masculine labels when the text should actually refer to both men and women. Bias, usually towards the male, is also to be avoided in your research.
Example The use of words such as manpower rather than labour power, one-man

show rather than one-person show, and the generic he or his when you are referring to a person of either sex. Research bias can occur when you devise a study that assumes the boss is a man, or that all primary school teachers are women.

BE PUNCTUAL, CONVENIENT AND BRIEF


Punctuality, brevity and courteousness are essential qualities to help your efforts to gain information. Appointments should be made and kept. Time is a valuable commodity for almost everybody, so it will be appreciated if you regard it as such.

Chs-12.qxd

5/25/2004

11:59 AM

Page 152

1 5 2

Y O U R

U N D E R G R A D U A T E

D I S S E R T A T I O N

Example

You need to get expert information on the intricacies of management

procedure in a hotel reception. You turn up three-quarters of an hour late, just at a time when a large crowd of business people (note not businessmen) normally arrive to check in. You have missed your slot and will cause real inconvenience if you start asking questions now.

BE DIPLOMATIC AND AVOID OFFENCE


On the whole, people are willing to help students in their studies. However, do not abuse this willingness by being arrogant and insensitive. You might be dealing with delicate issues, so try and get informed about the sensitivities and feelings of the participants. Above all, do not make people appear ridiculous or stupid!
Example Dont regard yourself as the host of a chat show when, say, interviewing a

group of elderly people in a residential home about their past lives. They may have very different views on what is proper to talk about, so avoid the pressure tactics and clever questions used to prise out information not willingly given.

FIGURE 12.3 Avoid causing offence

GIVE THANKS
Any help should be acknowledged with thanks, whether verbal or, in the case of questionnaires or letters asking for information, written.

Chs-12.qxd

5/25/2004

11:59 AM

Page 153

W H A T S

A L L

T H I S

A B O U T

E T H I C S ?

1 5 3

Example

Adding a short paragraph at the end of the questionnaire thanking the

person for answering the questions is simply done, as is a simple expression of thanks before leaving after an interview.

12.3

Scientific honesty and subjectivity

This refers back to some of the issues raised in Chapter 5 about philosophy. The main point I want to make here is that of being scrupulously honest about the nature of your findings, even (and especially) if they tend to contradict the main thrust of your argument. Good quality research is not achieved by using the techniques of a spin doctor. Politicians might want to put the right kind of gloss on data collected for them in order to bolster their arguments, but this is not tolerated in academic work. Data should speak for themselves. Your analysis should reveal the message behind the data, and not be used to select only the results that are convenient for you. As with most things, this kind of honesty can be more complicated than at first glance. Consider the following scenario. A study is being carried out of the use of animals in experiments to develop new products, in this case, an anti-ageing pill that may have useful properties for combating Alzheimers disease. The data on the level of discomfort that the animals suffer, based on medical measurements and observations, are contradictory and difficult to quantify. The researcher carrying out the study feels that an anti-ageing pill is not really a medicine, so testing on animals is not justified. However, the experimenters argue that if many human lives can be prolonged by fighting off the horrible effects of Alzheimers, then the slight suffering of some animals is justified. How will the researcher present the data in an honest and balanced way? It would be easy to present one side of the argument and stress the amount of suffering caused to animals in the search for an elixir of youth. That the animals suffer can be derived from the data. By interpreting the data on the animals discomfort level as demonstrating cruelty, and by ignoring the likely medical benefits of the pill, a strong case could be made for discontinuing the experiments. But such certainty is not inherent in this situation. Much better, i.e. more honest, if the researcher discussed the issues driving the research, and the difficulty of gauging the level of suffering of the animals, and concentrated on assessing the strengths of the opposing arguments, taking into account the uncertainties of the data and of the eventual properties of the product. If you can achieve a balanced view, it is probably not necessary to specifically state your personal attitude to the issues. However, there are situations where it is impossible to rise above the events and be a detached observer. For example, if you are a committed and active supporter of a ban on hunting with dogs, and make a study of this sport, you should declare your interest. Your arguments may well be valid and based on good evidence, but you are unlikely to seek supporting evidence for the other side!

Chs-12.qxd

5/25/2004

11:59 AM

Page 154

1 5 4

Y O U R

U N D E R G R A D U A T E

D I S S E R T A T I O N

FIGURE 12.4

Another way to ensure that you will avoid being accused of spin or false interpretation of the evidence is to present all the data you have collected as fully and clearly as possible. This may be the results of a questionnaire, measurements of activities or any other records relevant to your study. You can then base your analysis on these data, and it is open to the reader to judge whether your analysis is correct and whether your conclusions are valid. All arguments are open to challenge, but if you present the raw materials on which your arguments are based then at least the discussion has a firm foundation.

12.4

What should I do now?

The issues of ethics in academic work pervade almost all aspects. Some of these issues are based on simple common sense and civilized behaviour, such as ones relationships with colleagues and other people. Others are more formal in character and require real organizational effort in order to fulfil the requirements, such as systematically employing a sound referencing system, and gaining permissions for use of information and activities. You should therefore:
Consider carefully how you will use the written work and ideas of other people in your dissertation. Will you be discussing and comparing their ideas, or will you be developing ideas of your own based on those of others? You will probably do some of both.

Chs-12.qxd

5/25/2004

11:59 AM

Page 155

W H A T S

A L L

T H I S

A B O U T

E T H I C S ?

1 5 5

Consciously devise a method to differentiate between quotation, summary, paraphrase and commentary so that you will be aware of which mode you are writing in at any time. Examine your plans for getting information from other people. Systematically organize them to take account of all the relevant ethical issues. This will entail matters of procedure as well as content in written and verbal form. You can use the bullet points of aspects above as a checklist.

12.5

REFERENCES TO MORE INFORMATION

Although ethical behaviour should underlie all academic work, it is in the social sciences (as well as medicine etc.) that the really difficult issues arise. Researching people and society raises many ethical questions that are discussed in the books below. The first book has two sections that are short and useful. The other books on this list are far more detailed and really aimed at professional researchers though the issues remain the same for whoever is doing it. Robson, C. (1993) Real World Research: A Resource for Social Scientists and Practitioner-Researchers. Oxford: Blackwell. See pp. 2934, 4705. Laine, M. de (2000) Fieldwork, Participation and Practice: Ethics and Dilemmas in Qualitative Research. London: Sage. The main purposes of this book are to promote an understanding of the harmful possibilities of fieldwork; and to provide ways of dealing with ethical problems and dilemmas. Examples of actual fieldwork are provided that address ethical problems and dilemmas, and show ways of dealing with them. Mauthner, M. (ed.) (2002) Ethics in Qualitative Research. London: Sage. This book explores ethical issues in research from a range of angles, including: access and informed consent, negotiating participation, rapport, the intentions of feminist research, epistemology and data analysis, tensions between being a professional researcher and a caring professional. The book includes practical guidelines to aid ethical decision-making rooted in feminist ethics of care. Geraldi, O. (ed.) (2000) Danger in the Field: Ethics and Risk in Social Research. London: Routledge. Read this if you are going into situations that might be hazardous. Barnes, J.A. (1979) Who Should Know What? Social Science, Privacy and Ethics. Harmondsworth: Penguin. A good comprehensive guide, but probably too involved for your purposes.

Chs-12.qxd

5/25/2004

11:59 AM

Page 156

1 5 6

Y O U R

U N D E R G R A D U A T E

D I S S E R T A T I O N

There are also books about ethics that specialize in certain fields. Here are some examples. You could search out some in your subject perhaps. Whitbeck, C. (1998) Ethics in Engineering Practice and Research. Cambridge: Cambridge University Press. Graue, M.E. (1998) Studying Children in Context: Theories, Methods, and Ethics. London: Sage. Royal College of Nursing (1993) Ethics Related to Research in Nursing. London: Royal College of Nursing, Research Advisory Group. Burgess, R.G. (ed.) (1989) The Ethics of Educational Research. London: Falmer. Rosnow, R.L. (1997) People Studying People: Artifacts and Ethics in Behavioral Research. New York: Freeman.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 147

What Examiners Look For

OBJECTIVES
In this chapter you will learn how to: Ensure that your response to an exam question is on target Present your answer to impress the examiner Demonstrate a critical approach in addressing your topic Present arguments that use evidence and show independent learning Use problem-based learning to improve the quality of your responses Respond accurately to the shades of meaning in exam questions

9.1

Answering the set question!

It is wise to remember that it is possible to look without seeing we can sometimes be primed to see what we want to see rather than what is actually there. Of course if you are asked to write an appreciation of a piece of art or poetry it may well be that you should project your personal interpretation on to the work. Outside this, it is more likely that you will be expected to produce relevant evidence and arguments to address the set question in the exam with a clear focus. Although the virtue of using past papers for revision has been extolled, the danger with this approach is that you may twist the meaning of a question into what you hope it is going to be. Another danger is that the question you wanted so much is right in front of your eyes but you fail to see it because the form of wording is different from the previous occasions and in blind panic you de-select that one as an option! Therefore, the general advice at this point is, slow down, read carefully and make your question selection advisedly.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 148

148

/ EXAM SUCCESS

When you have made your choice, write the question out and this will be the final insurance that you have not misunderstood its intent. In a famous optical illusion, Rubins vase can be seen as a vase or as two faces that look towards each other. You can see it both ways and it can change from one to the other. In contrast, exam questions are usually set so that they can be addressed in one way (although there is room for variety in structure, style and perhaps some substance).

A worked example
Question: Evaluate the important ingredients in the security, prosperity and happiness of a city. Strategy for response: First, make a list of major city functions, such as: Health Education Transport Trade Security What you should not do is: Merely describe the function of each Focus only on the ones that are of interest to you Go off on a tangent such as the effects of flooding on a city Infrastructure Housing/Property Finance/Banking Parks and Greens Traffic Crime Entertainment Leisure/Sport Art/Museums Employment

What you might think of doing is: Describe each one in a brief sentence of two Highlight how the quality of life would be diminished if any of the above were missing Show that the various facets are dependent on other aspects of city life You may want to list essential and non-essential services and then rank each of these in turn

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 149

W H AT E X A M I N E R S L O O K F O R /

149

9.2

Initial evidence of focus

It is said that you never get a second chance to make a first impression, and in the first paragraph of your response to the exam question you have the opportunity to shape the initial impression in the examiners mind. That does not mean that his or her final impression is sealed, but it does give you the opportunity to set up and then confirm a positive overall impression. If the initial few sentences are good in quality, this will also help you to settle down and you will feel spurred on to do well.
PRACTICAL SCENARIO: A JOB INTERVIEW

To give a good initial impression when you go for a job interview, you could:
Walk confidently into the room Be smartly and neatly dressed, trimmed and clean Smile and say hello to each member of the interview panel Briefly get eye contact with each panel member but without a fixed gaze Shake hands if offered and exchange courtesies diplomatically Sit when you are invited to

In an exam situation you do not have the non-verbal cues that you can use to create a good impression in an interview, but you have written cues you can use to demonstrate that you have purpose, focus, direction, knowledge and understanding. In the next section you will see how rough work can be used in shaping a good impression, but another tool is the preliminary use of key words and terms.

A worked example
Question: Discuss the essential elements that help in building good friendships that will last. (Continued)

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 150

150

/ EXAM SUCCESS

(Continued) A good strategy is to list the important ideas that spring to mind, such as: Not being too demanding Inviting your friends to events that are important to you Overlooking faults Showing acts of kindness and generosity Being willing listen when friends need someone to talk to

You may want to add a few of your own to the above list. It is also a good strategy to drop key words such as these into the opening sentences to demonstrate that you know exactly where you are taking the examiner in your journey together. As an example of mapping out a strategy in advance, think of going for a walk in a country forest park you may find maps at the beginning of the walk so that you can decide which routes you want to take and in what order. In the first couple of sentences of your exam question response you can, as it were, create a map for your examiner. You can tell her or him where you are going to take them. Be sure to give the impression that you know where you are leading them.

9.3

Rough work may be helpful

Some students prefer to use mind maps in drawing up plans for an essay or exam question. It is acceptable to draw out your own mind map design, and this is all you will be able to do if you opt for this method in your exams as you cannot resort to software. However, this approach may not be appealing to all and you may prefer to use a simple structure approach such as the use of headings and subheadings. When mind mapping is used with software packages you can achieve complexity by using colour codes, circles, squares, rectangles and ellipses, and you can set up pathways in which your variables are joined by direct or indirect routes. These may be very useful in your revision or even in a presentation, but in your exam you will not need all the decorative niceties. Your aim should be to draw a basic map as quickly as possible. The more complex your map is, the more difficult it will be to remember all the points and the longer it will take to draw out all the parts. Consider the following question and then see how the response can be briefly plotted out in a mind map (or even in the form of a flow chart).

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 151

W H AT E X A M I N E R S L O O K F O R /

151

Question: Outline the essential factors and applications of communication in a variety of human settings.

Communication

Verbal

Non-verbal

Education Pupils Teachers Heads Governors Staff

Health Doctors Nurses Patients Multidisciplines Administrators

Organisations Managers Workers Unions Reps Supervisors

9.4

Balance, connection and fluency

On the point of balance, it is essential that you do justice to all aspects of an argument. For example, in terms of length, this suggests that paragraphs should be of approximately similar length although there are no hard-and-fast rules. Some aspects of your subject may require a little more treatment than others, but if you alternate between very long and very short paragraphs your argument may appear to be lopsided. It is also vital that you do not suddenly introduce an argument that appears to be grossly out of place or sequence there has to be some connection between your points and you should not assume that your reader will always see these without you demonstrating them. Finally, aim to communicate the impression that your work flows from start to finish. If you achieve this, you will have integrated a variety of valid points into one coherent

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 152

152

/ EXAM SUCCESS

and convincing whole. What fluency will do for you is to give your essay some life. Your response to the question should not be a mere list of hard, cold facts that are joined up by nothing more than punctuation and conjunctions. Use of illustrations and applications can add colour, spice and variety to your responses unless these have been outlawed in your subject domain. However, illustrations should not become an end in themselves, and neither should they be irrelevant or forced.

A worked example
Question: What are the advantages and disadvantages of the widespread introduction of computers into higher education? An additional task that almost all students are now required to master Possible advantage to those from comfortable backgrounds Most universities are well equipped with modern computers Pressure on finding computer space at busy times in university libraries Students can still schedule time for off-peak periods Up-to-date electronic journals are readily available Library searches are much easier than before Word processing means work is easily modified Quality of presentations can be enhanced using computer graphics Advantages include spelling and grammar checks and word counts Anxious students avoid using computers and may therefore fall behind Economically disadvantaged students may not be put on an equal footing Computer skills are transferable (across modules and years) Computer skills are impressive on a CV Computers can become addictive a time-wasting distraction for students This is an applied topic and all students will have views on it both from their own personal experience and from observation of other students. If this were your exam topic, you could go into the test armed with information from computer and educational studies, and this could be complemented by case studies and your own anecdotal experience and observations.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 153

W H AT E X A M I N E R S L O O K F O R /

153

Exercise See if you can condense each of the bullet points above into a brief word or two that you can use as memory joggers for your rough work in an exam. It may help to underline a key word or two in each. For example, just the use of CV could help you remember the penultimate point.

Checklist Golden Rules Cant say everything about everything Must make selections Choose examples from each domain Find some major headings Cluster examples under appropriate headings Draw connections between major concepts Decide on order for working through these step by step Decide if there is one or more central concept Avoid too much complexity in sketching outline Balance the number of issues under each heading

9.5

Corroborate with evidence

Many academic subjects are driven by theory, research and empirical findings, and if this is the case, then you must show that you know the relevant literature. The more evidence you can use the better (if you use it effectively). However, you cannot go into a detailed description of every relevant study you have read. Rather, you can summarise and show the relevance of a given study in a brief few sentences. Make sure you give the impression that you are using the evidence to support your arguments and to build up your case. Of course, you will want to come to some definitive conclusions in your exam essay, but on the journey there you will need to show that you have reached your conclusions in the light of (perhaps) conflicting evidence. It may be that your overall

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 154

154

/ EXAM SUCCESS

conclusion is that the balance of probability lies in one side of the argument, but you may conclude that further studies are needed to address some unresolved issues. An example of a question like this would be the MMR issue that was discussed in Chapter 7: Can it be concluded with certainty that the triple vaccine for mumps, measles and rubella is now safe?
Checklist Using evidence Describe findings accurately and succinctly Use relevant names (authors of theories or research) Use as many dates as you can remember Cover the development of the topic and incorporate up-to-date findings Present all sides of an argument Only make strong claims that are evidence-based Use a variety of evidence to build a case (showing convergence) Come to conclusions based on the balance of probabilities (if need be) Identify unresolved or inconclusive issues Map out where future research needs to go

9.6 Independent and problem-based learning


What your assessors will not be looking for is a verbatim account of what they delivered to you in a lecture or tutorial. You should show evidence that you have read from the sources they have directed you to in reading lists. Examiners also like to see that you have taken some initiative by delving into other sources that they had not highlighted. From the standpoint of a marker, it is most refreshing to assess students who have taken the time and trouble to bring some new facet of research to the subject under investigation. It is especially impressive if students can integrate some up-to-date sources, and this should not be too difficult given the plethora of electronic journals that are available in modern universities. Moreover, these sources are a great advantage when there are constraints upon your time. They are easily accessed and summaries of central findings are often available in abstract form. The main findings can be rapidly outlined, grasped and noted.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 155

W H AT E X A M I N E R S L O O K F O R /

155

A form of learning that has been advocated in higher education circles and has gained popularity is the notion of problem-based learning. In this form of learning activity, a group of students is given a task by their tutor, and individual students go their separate ways to extract the information they need. When each has finished their task they come together and use their collated information to try to solve the problem that had been posed by the tutor. Instead of being taught directly, students endeavour to find answers for themselves, and it is believed that this can be an effective form of learning because it facilitates a deeper processing of information. Therefore, in order to prepare thoroughly for your exams, you may want to engage in some problem-solving activities either alone or with other students.

DIRECTIONS FOR PROBLEM-BASED LEARNING

Find a question or problem that will get you engaged with your topic Trace relevant sources and read around the topic Make a full list of all the relevant ideas Rank in order the steps that will lead to the solution Draw these out in a mental map or a flow chart Judge if any step can be removed without making a difference Is there more than one route to your goal? Are there direct and indirect pathways that should be mapped out? Are there bi-directional pathways? Is there one answer or multifaceted answers?

These steps will become clearer after you look at the worked example and the diagram presented below.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 156

156

/ EXAM SUCCESS

A worked example Low birth weight babies: what is the cause?


Smoking in pregnant women has been implicated as a possible cause Stress has also been highlighted Stressful women might be more likely to smoke during pregnancy The fathers smoking habits have also been suggested Therefore stress in fathers may also be implicated Parents can either buffer or trigger stress in each other Poor nutrition may also have a large causal impact Both parents may be responsible for the mothers poor nutrition A genetic component might be implicated, and also inactivity A conceptual diagram can be drawn to suggest possible causal pathways

Genetic

Diet

Stress

Smoke

Inactive

Mother

Father

Low birth weight baby

Like so many problems, there is a multifaceted explanation and by presenting the problem in diagram form you will show the examiner that you are aware of all the direct, indirect and bi-directional effects. For example, the mother and father may both influence low birth weight babies through genetics. They may also influence each other through four of the five variables shown in the diagram. It is clear that to put the whole problem down to pregnant women smoking is rather nave.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 157

W H AT E X A M I N E R S L O O K F O R /

157

Exercise If you look at the five variables at the top of the diagram you may be able to work out how some of them impact on each other, and you can draw linking lines with arrows to denote the pathways.

9.7

Characterised by critical thinking

It is possible to argue through issues in the world of academia without becoming vitriolic towards other colleagues or fellow researchers. Critical thinking goes on all the time and academics constantly raise questions and problems in relation to each others work. It is a violation of academic professionalism to run a vendetta against a colleague. Students sometimes find it a little difficult to make the transition from the secure world of comfortable thought and certainty to the real world of academia where findings evolve and develop through critical thinking, which may sometimes appear to be more like verbal sparring. You should understand clearly that it is critical thinking that will get you your best grades in your exams, but the critical thinking must be evidence-based and not driven by personal prejudices or hunches. It is often a difficult task to rise above our personal, subjective world in order to evaluate objectively the full range of evidence without giving the impression that we have a personal axe to grind. A writer can set up the alternatives to their preferred explanations to knock them down again in order to give the impression that they are even-handed and objective. Emotional involvement with a given topic may not be a bad thing in driving the investigation but it can lead to disguised distortions of reality. In order to illustrate critical thinking with an example, we will therefore address the issue of prejudice.

A worked example Critical thinking about prejudice


Imagine that you have been asked the following question in an exam: Discuss the assertion that it is impossible for a human being to be totally free from prejudice. Before examining the drafted response below, you might want to make some notes of your own in response. (Continued)

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 158

158

/ EXAM SUCCESS

(Continued) It may help to start with a definition of prejudice: A prejudiced person can be defined as being like a jury that gives a verdict before it hears all the evidence. All grown-up human beings are likely to have or have had some degree of prejudice against others Prejudiced people attend to and select information that confirms their views Prejudiced people are likely to filter out information that does not support their views, or they will absorb it as an exception that proves the rule Prejudice is very resistant to change Prejudice may be intransigent because it is perceived as a mechanism for preservation and a buffer against insecurity Many will not admit to prejudice unless they are with trusted friends Some may not even realise that they are prejudiced Many may like to present the faade of fairness and objectivity The middle classes may be more cunning in their use of prejudice These complexities may make prejudice more difficult to address Getting to know the people you are prejudiced against may help to reduce prejudice Working models that have attempted to counteract prejudice is a useful starting point Courageous individuals who admit their prejudices (and the wrongness of them) may be more effective challenging others than those who merely point the accusing finger The goal of eliminating prejudice entirely may be unrealistic It may be possible to reduce prejudice and bring it under control

A glance at the above points will clearly demonstrate that you are armed with a series of points and counterpoints that will form the heart of a good, critical essay.

9.8

Year one and beyond

If you are in the first year of a university course it is likely that your results will not count towards your final degree classification. Therefore, all that will be required is to pass your exams at the stipulated level (typically 40 per cent in the UK for undergraduate programmes). That does not mean that you should content yourself with

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 159

W H AT E X A M I N E R S L O O K F O R /

159

marginal passes, as this is not good for your confidence or anxiety! However, the knowledge that your degree classification is not at stake will give you the time and opportunities to develop the skills presented and advocated in this book. If you have already progressed beyond year one, then it is essential that you cultivate the skills presented in this book. The object of learning is not just about reproducing knowledge and demonstrating good memory skills. It is also about:
Addressing the question directly Writing succinctly and with focus Using a critical thinking approach Using relevant and up-to-date evidence to support your claims Presenting balanced arguments

If what you have been doing to date has not been working for you in terms of the grades you are attaining, then it is time to do some diagnostic troubleshooting. Do not allow yourself to lapse into the thinking mode where you convince yourself that you cannot change. It is possible to change your thinking style and strategy into one that will produce dividends for you. The following summary checklist will also help you to focus your attention on your strengths and weaknesses.
Checklist What examiners look for Ensure that you are addressing the question before you start writing A mind map or flow chart, or headings and subheadings, will aid your essay structure and help you plan and pace your answer Aim for a balanced structure that avoids padding and does justice to all facets of an argument Hit all the right notes in your introductory paragraph and include relevant key words Make a point of sprinkling your exam essay liberally with cited evidence It is useful to demonstrate that you have done some independent learning Evidence of critical thinking will demonstrate that you have learned at a deeper level than rote reproduction

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 160

160

/ EXAM SUCCESS

Problem-based learning will provide you with an impressive format for addressing exam issues Time management (highlighted in earlier chapters) will allow you to pace out all the issues you aim to tackle

9.9

The key words in the question

IF YOU ARE ASKED TO WRITE A DISCUSSION

Whenever you are engaged in discussion, you are examining possibilities and exploring various avenues of thought. There should be a tone of investigation and enquiry, but with the important proviso that there is an end product. The discussion should be going somewhere it must have shape and direction. There is room for a discussion to be tentative, but no place for it to be vague. As an example, you could think of a television discussion show that includes a panel of experts and a person in the chair to guide the proceedings. If you are the person in the chair you would be concerned to:
Establish that the invited guests represent all shades of opinion on the issue Ensure that contributors have the opportunity to articulate their views Give contributors the opportunity to respond when their views are challenged Prevent any individual monopolising the discussion Control the interruptions that would make the discussion chaotic Summarise the conclusions in a fair and even-handed manner

In an exam you are to be the chairperson and it will be your responsibility to conduct the discussion in a well-ordered, fair and thorough manner. Although the tone of the discussion is different from, say, a critique, this does not mean that it should be tame in nature. There is room in the discussion for a vigorous exploration of evidence and counter arguments.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 161

W H AT E X A M I N E R S L O O K F O R /

161

IF YOU ARE ASKED TO WRITE A CRITICAL REVIEW

The tone of a critique should be a little more adversarial than a discussion. In the discussion you are the chair to guide the panel, but in the critique you are the judge to guide the court proceedings. Imagine there is a defence team and a prosecution team, and your aim is to find the evidence that will stand up in a court of law. You should not be afraid to chop down claims that do not stand up in the light of the evidence. However, that does not mean (to change the metaphor) that you should be a knife-happy surgeon who is intent on operating on every condition. Do not criticise just for the sake of it, or give the impression that you have been baptised in lemon juice! What you really need to ascertain through your critique is what is left of the issue or claim when you hold it up to test it against the evidence? If the basic premise has been supported with evidence again and again, then you can argue that the evidence is robust. For example, you can pose questions, such as those presented below:
Has the claim, hypothesis or theory stood the test of time? Are large claims made on the basis of very flimsy evidence? Do various studies leave an impression of uncertainty and a need for further investigation? Does it appear that aspects of previous investigation have been driven by prejudice or vested interests? Is the evidence supporting the results weak, moderate or strong? Should you end by highlighting certainties and uncertainties? Can you identify issues that are no longer relevant to the debate (red herrings)? Can you earmark issues for further investigation? Are there issues that are going in the expected direction and are clearly promising?

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 162

162

/ EXAM SUCCESS

Exercise If you wish, use these questions to challenge the issues raised in the MMR example in Chapter 7.

Exercise Write your own checklist on the essential steps in a critique (think of yourself as a judge in court presiding over the prosecution and defence lawyers). The exercise will be easier if you choose a theme such as Should parents be allowed to smack their children?

IF YOU ARE ASKED TO COMPARE AND CONTRAST

If you are asked in an exam or course work essay to compare and contrast two concepts, you will need to identify a range of issues that you can discuss within this context. You may begin by making a list of all the things that the two concepts have in common, and then list all the factors in which they differ. It is best to identify an equal number of issues (if possible) under each heading so that the conclusions are balanced.

A worked example Compare and contrast popular and classical music


Similarities Both use the same music clefs Wide range of instruments used Differences Pop often associated with teenagers Pop often linked to lively parties and discos (Continued)

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 163

W H AT E X A M I N E R S L O O K F O R /

163

(Continued) Performed with or without lyrics Live and recorded performances Listened to for pleasure, relaxation, inspiration and mood control Variety of styles Both used in films and advertisements Pop may be louder and more shocking Classical often preferred by the middle classes Classical often has a more complex structure Classical pieces are of longer duration Classical pieces often have more musicians Different conventions for dress

In drawing this to a conclusion, you could say that: Some people listen to both Some listen exclusively to one or the other Performers have migrated from one to the other Some writers/composers have integrated both Both serve the needs of individuals and crowds Both have useful applications to advertisements, films, therapy, etc.

Exercise Write your own summary checklist on the major factors involved in producing a well-rounded essay that compares and contrasts two issues.

Another variation of the comparing and contrasting approach is when you are asked to outline the advantages and disadvantages of an issue (see the worked example on the

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 164

164

/ EXAM SUCCESS

advantages and disadvantages of the widespread introduction of computers in higher education in section 9.4 above). You may want to look back at this example to see if you can classify the advantages and disadvantages and add any points that are needed to balance the arguments. Alternatively, examine the advantages and disadvantages of small, street-corner stores and large supermarkets, or think up a new example of your own.

IF YOU ARE ASKED TO EVALUATE

Illustration Antique objects and antiquated concepts


If an expert were asked to value a painting or a piece of sculpture, she would be keen to ascertain who the artist was and when the artwork was created. There is no doubt that some objects increase in value with the passage of time. If objects have stood the test of time, they may be very valuable, especially if they were painted or constructed by a master craftsperson.

And yet, in academia, the opposite can be true. Although a concept or theory may have been popular and widely accepted 30 years ago, more recent research findings may have chipped away at the foundations over the decades. Other aspects may now have been added to the original proposition, so that what is left now is a modified version of the original. Therefore, if you are asked to evaluate, you may want to consider the following:
State the basic premise Show where findings have attacked aspects of this premise Highlight aspects that have been strong enough to endure Identify any new aspects that have been added to the original

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 165

W H AT E X A M I N E R S L O O K F O R /

165

Present the new animal with its additions and subtractions Demonstrate the usefulness of the concept Map out the reasons why the premise is set to persist in the future

To wrap up this section just read over the following points. The italicised words are the key words in exam questions. This will give you an idea of how examine questions can be spun from various angles. Remember, each key word requires a different kind of approach.
Evaluate whether modern prisons achieve the aims of reducing crime and reforming criminals. Discuss other factors that might be run in parallel with the prison system that would be a positive complement to its work. Compare and contrast the work of prisons with rehabilitation day centres. Write a critique on whether there is value in exploring the criminal mind.

9.10 Attention to the qualifying words in a question


EXAMPLE 1 IF YOU ARE ASKED TO ADDRESS MORE THAN ONE ISSUE

A close inspection of an exam question may reveal that you are required to address more than one central issue. Unless you have been guided otherwise, you should, in general, try to give equal weighting to all the issues. Consider how you would address the following question: Why are some students prone to catch colds, and can anything be done to address this problem? The second part of the question should be as important as the first and clearly requires more than a yes or no answer. In the example provided here, you would be advised to link each potential cause with a corresponding prevention or cure. It is probable that examiners will award 50 per cent for each part of the question.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 166

166

/ EXAM SUCCESS

EXAMPLE 2 WHEN A FEW WORDS FILTER THE DIRECTION OF THE QUESTION

Some questions may direct you into a line of response in the last few words (or in the opening words). Therefore, read the question carefully so that you do not go off track. Consider the following question: Discuss the impact of airport noise on those that live near airports. The last part of the question excludes:
Those who work at airports Those who work and travel on planes Those who travel to and from airports Those who live in low fly zones away from airports

However, it includes the effects of noise:


During the daytime During the nighttime On health, quality of life, etc. In airports generally no one airport is specified On the value of houses in the vicinity

The question does not specify how close to the airports people should be living in order to be taken into account in your essay it could be one, two or five miles. So you are probably expected to address the issue in general without specifying a distance.

EXAMPLE 3 WHEN THE QUESTION DOES NOT HAVE A KEY WORD

Another issue to bear in mind is that you may not be asked directly to write a critique, discussion or evaluation, but your tutors will probably have directed you to use a

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 167

W H AT E X A M I N E R S L O O K F O R /

167

critical approach in general in approaching all exam questions. Take the following two examples: Should sex education be given to children in primary education? Should Shakespeares plays be left in their original Elizabethan language? Although no key words such as discuss or evaluate are used here, it is evident that the questions have been designed to elicit an essay that includes points and counterpoints.

EXAMPLE 4 WHEN THE QUESTION LEAVES THE SCOPE OPEN-ENDED

Sometimes examiners will leave you to select issues to illustrate the broader principles in the question. This type of essay needs careful thought in order to find the correct balance between the inclusion of too much or too little material. Given that we covered motivation in an earlier chapter, we will use this as our example: Discuss with the use of examples the assertion that motivation is the dynamic behind human change. You could select examples on:
Attraction and reproduction Power and promotion Earnings and savings Aggression and control Status and education

The problem in the question is that you are not asked to discuss a specific number of issues. Rather, you must decide how many examples to include. To help you think through this issue, have a go at the following exercise.

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 168

168

/ EXAM SUCCESS

Exercise Suggest some problems that may be associated with: The use of (a) too few examples and (b) too many examples in your exam responses. Some suggestions are provided after this exercise if you need to consult them.
(a) Potential problems with too few examples in exam responses: . . . (b) Potential problems with too many examples in exam responses: . . .

(a) Potential problems with too few examples in exam responses


Temptation to be too descriptive Insufficient material to make adequate critical comparisons Extending material to fill space It may look as though you have not read widely enough

(b) Potential problems with too many examples in exam responses


It can read like a list and therefore distort essay form It may look like a memory exercise The common thread between examples may not be clear It could appear as a shallow exercise

Mcllroy-09.qxd

3/4/2005

11:56 AM

Page 169

W H AT E X A M I N E R S L O O K F O R /

169

Exercise Write your own checklist of the factors that you would choose to guide you in deciding how many examples you would use in responding to an exam question (i.e. if the number of examples to use is not specified).
. . . . .

SUMMARY

Chapter 9 summary points:


Ensure you answer the question that has been set Avoid rough work that takes up too much time Present an introduction that shows focus and direction Provide balanced arguments that demonstrate objectivity Reinforce your arguments with references to evidence Ensure your responses are more critical than descriptive Show some evidence of independent learning Build your answer around the key words in the question

2 WHAT TUTORS LOOK FOR WHEN MARKING ESSAYS


............................................................................... bands Writing skills: `introductory', `intermediate' and `advanced' essays ............................................................................... One of the most frequent and reasonable questions that students ask is: What should I be doing to get a better grade? Of course, the answer to this question will depend on a number of factors. For example, what is required of an essay answer will clearly vary according to the precise question set. Equally, the standard expected of essay writing is likely to be higher on more advanced undergraduate courses than on those at entry level. Similarly, there may be higher expectations towards the end of a course than there were at its start. Having said this, it is possible to specify the various qualities (if only in general terms) that distinguish essays in the different grade bands, and what writing skills may be expected from essays at different levels.
Health warning

Marking schemes: criteria related to grade

We have included this section to give you a broad indication of what may be expected, in general, for different grades. Increasingly, grade bands are dened in relation to `learning outcomes' that draw on specied `subject benchmarks' and `key skills' (see, for example: http://www.qaa.ac.uk and http://www.qca.org.uk). Individual courses will have their

What tutors look for when marking essays

own course-specic requirements for each of the grade ranges, and also the requirements will vary depending on whether the course is at a more or less advanced stage of undergraduate study. As a result, where they are available, you may want to look at the learning outcomes specied for your particular course of study. However, you should remember that grading an essay is always a matter of weighing up not only the structure, content and style of the essay, but the interplay between these, together with the interplay between any number of the different intellectual challenges built into the assignment. For all these reasons you should not expect the criteria for the gradings for a specic course to map exactly on to what we have set out here.

2.1

Marking schemes: criteria related to grade bands

In this section you will nd guidelines adapted from those produced by the British Psychological Society (BPS, 1994) in conjunction with the Association of Heads of Psychology Departments. They do not correspond to the specic policy of the Open University or any other UK higher education institution, but they should give you an idea of the sorts of general things markers are likely to consider for different grade ranges. Remember, what is expected for a particular course for a particular grade may differ from these guidelines. Remember, too, you won't have to do well in every area to get a particular grade. For example, your depth of insight into theoretical issues may compensate for slightly weaker coverage of the evidence, or your understanding of the material may compensate for weaknesses in the coherence of your argument. It may also be the case that some of these criteria will be more relevant to advanced courses of undergraduate study (see Section 2.2 below). Many of the terms used below (for example, `developing an argument') are explored in greater detail in this

Good Essay Writing

guide. If you are unsure of their meaning, you may want to look them up.
Advice for OU students
Always remember to read the student notes for the specic assignment you are attempting and/or ask your tutor for guidance if you are unclear about what is expected of you.
Remember, particularly if you are new to the OU, that the University's marking scheme goes up to 100 and may be different from ones you have been used to in the past.

The criteria following the table indicate `excellent', `good pass', `clear pass', `bare pass', `bare fail' and `clear fail' essays. These categories broadly correspond to the following grade bands. Grade bands Conventional university Other 1st 2:1 2:2 3rd 70+ 6069 5059 4049 3039 029 A B C D Fail Fail

OU 85100 7084 5569 4054 3039 029

An excellent pass is likely to:

provide a comprehensive and accurate response to


the question, demonstrating a breadth and depth of reading and understanding of relevant arguments and issues; show a sophisticated ability to synthesize a wide range of material;

What tutors look for when marking essays

show a sophisticated ability to outline, analyse and

contrast complex competing positions and to evaluate their strengths and weaknesses effectively; demonstrate clarity of argument and expression; develop a sophisticated argument, demonstrating logical reasoning and the effective use of well selected examples and evidence; where appropriate, demonstrate an ability to apply ideas to new material or in a new context; demonstrate depth of insight into theoretical issues; demonstrate an ability to write from `within' a perspective or theory, including the ability to utilize appropriate social scientic concepts and vocabulary; may show a more creative or original approach (within the constraints of academic rigour); use a standard referencing system accurately.

A good pass is likely to:

provide a generally accurate and well-informed answer

to the question; be reasonably comprehensive; draw on a range of sources; be well organized and structured; demonstrate an ability to develop a strong and logical line of argument, supported by appropriate examples and evidence; show an ability to synthesize a wide range of material; show an ability to outline, analyse and contrast more complex competing positions, and to evaluate their strengths and weaknesses effectively; demonstrate the ability to work with theoretical material effectively and some condence in handling social scientic concepts and vocabulary; where appropriate, demonstrate an ability to apply ideas to new material or in a new context; show a good understanding of the material;

10

Good Essay Writing

be clearly presented; use a standard referencing system accurately.


A clear pass is likely to:

give an adequate answer to the question, though one

dependent on commentaries or a limited range of source material; be generally accurate, although with some omissions and minor errors; develop and communicate a basic logical argument with some use of appropriate supporting examples and evidence; demonstrate an ability to synthesize a range of material; demonstrate an ability to outline, analyse and contrast competing positions, and to begin to evaluate their strengths and weaknesses (although this may be derivative); demonstrate a basic ability to address theoretical material and to use appropriate social scientic concepts and vocabulary; be written in the author's own words; show an understanding of standard referencing conventions, although containing some errors and omissions.

A bare pass is likely to:

demonstrate basic skills in the areas identied in the


`clear pass' band but may also:

answer the question tangentially; miss a key point; contain a number of inaccuracies or omissions;

What tutors look for when marking essays

11

show only sparse coverage of relevant material; fail to support arguments with adequate evidence; be over-dependent on source material; contain only limited references.

A bare fail is likely to:

fail to answer the question; contain very little appropriate material; show some evidence of relevant reading but provide

only cursory coverage with numerous errors, omissions or irrelevances such that the writer's understanding of fundamental points is in question; be highly disorganized; contain much inappropriate material; lack any real argument or fail to support an argument with evidence; demonstrate a lack of understanding of social scientic concepts and vocabulary and an inability to deploy social scientic writing skills such as skills of critical evaluation, synthesis, and so on; be unacceptably dependent on sources; be plagiarized (sometimes); demonstrate problems in the use of appropriate writing conventions such that the essay's meaning is systematically obscured.

A clear fail is likely to:

show a profound misunderstanding of basic material; show a complete failure to understand or answer the
question; provide totally inadequate information; be incoherent; be plagiarized (sometimes).

12

Good Essay Writing

Essays are assessed not weighed

2.2

Writing skills: `introductory', `intermediate' and `advanced' essays

As you move from entry level to more advanced undergraduate courses it is likely that you will be expected to develop and demonstrate an increasing range of essay writing skills. For example, you may be expected to write from `within' a particular perspective, handle more complex theories or systematically interrogate original sources. A general guide of this kind cannot give you a full breakdown of the skills that will be relevant to every course that you may take. What it tries to do is provide an outline of `core' skills. Individual courses may emphasize different parts of these core skills or may involve specic skills of their own (for example, project writing, employing

What tutors look for when marking essays

13

specic research methods, using graphs to present information). Individual essays may also require you to emphasize some `core' skills more than others. As a result of these factors, you will need to adapt what we have set out below according to the demands of different questions and different courses. We look now in detail at the various criteria that may be expected to distinguish a `basic' or `introductory' undergraduate essay from `intermediate' and `advanced' essays. Once again, many of these points are developed in later sections, so if you are not sure what the points mean (e.g. `signposting', writing from `within' a perspective), you may want to look them up.
Advice for OU students
The following criteria broadly map onto the OU's undergraduate courses in the social sciences at Levels 1, 2 and 3. Thus, having completed your Level 1 course you could be expected to have developed the various essay writing skills identied as appropriate to an `introductory' essay. Remember, you would not necessarily be expected to have these skills already in place on starting an OU Level 1 course in the social sciences. Having developed these skills in the course of your Level 1 studies, you should be ready to tackle essay writing on a Level 2 course, where you would learn the skills identied as appropriate to an `intermediate' essay.

The `introductory' essay

Introductions are likely to demonstrate:

a clear understanding of the scope of the question and


what is required; the ability to `signpost' the shape of the essay's argument clearly and concisely; a basic ability to dene key terms.

14

Good Essay Writing

Main sections are likely to demonstrate some or all of the following, depending on what the question requires:

an ability to construct a basic argument that engages

with the question; cis aspects of relevant material clearly the ability to pre and concisely, often relying on commentaries and other secondary sources; the ability to outline the basics of relevant theories; the ability to support arguments with appropriate evidence and examples drawn from different sources; an understanding that different theories are in competition, the ability to outline the main similarities and differences between these, and a basic ability to evaluate their strengths and weaknesses; an ability to utilize basic maps, diagrams and numerical data in a way that supports the discussion; some familiarity with major perspectives in the social sciences; some familiarity with relevant social scientic vocabulary.

Conclusions are likely to demonstrate:

the ability to summarize the content of the essay


clearly and concisely and to come to a conclusion. Quotations should be referenced, and `pass' essays will always need to avoid plagiarism. Essays should `ow' smoothly, use sentences, paragraphs and grammar correctly, and be written in clear English.
The `intermediate' essay

In addition to skills in all the above areas, intermediate essays may also show the following. Introductions are likely to demonstrate:

What tutors look for when marking essays

15

a clear understanding of more complex essay questions; a basic ability to `signpost' the content as well as the
shape or structure of the essay but not in a laboured way; a grasp of the major debates that lie `behind the question'; an ability to dene key terms.

Main sections are likely to demonstrate some or all of the following, depending on what the question requires:

the ability to construct more complex arguments rele the ability to `weight' different aspects of the material
vant to the question; according to their signicance within the overall argument; cis the key debates an ability to identify and pre relevant to the question; the ability to outline more complex theories in a basic form; an ability to relate abstract ideas and theories to concrete detail; an ability to support arguments with appropriate evidence and examples; an ability to utilize information drawn from across a wide range of source materials; the ability to make more complex evaluations of the strengths and weaknesses of competing positions and make a reasoned choice between these; an ability to utilize more complex maps, diagrams and numerical data; a preliminary ability to work from original texts and data without relying on commentaries on these; increased familiarity with major social scientic perspectives and social scientic vocabulary and increased condence in applying these to specic issues; a preliminary ability to write from `within' specic perspectives or theories;

16

Good Essay Writing

an ability to pull together different aspects of the

course and to apply these to the essay; a basic ability in selecting and using appropriate quotations from, and making references to, key texts in the eld. Conclusions are likely to demonstrate:

an ability to highlight the essay's core argument; the ability to provide a basic summary of the key

debates raised by the question and the ability to provide an overview of `current knowledge'; a preliminary ability to point to absences in the argument or areas worthy of future development. Essays should also be properly referenced, be written in the author's own words, and utilize a more developed and uent writing style (for example, by handling transitions effectively).
The `advanced' essay

In addition to skills in all of the above areas, advanced essays may also show the following. Introductions are likely to demonstrate:

the ability to present a more sophisticated version of

the essay's core argument; the ability to summarize in more sophisticated forms the key debates raised by the question; the ability to provide more sophisticated denitions of terms; an ability to really interrogate the question by focusing on ideas or sub-questions prompted by the question in hand. Main sections are likely to demonstrate some or all of the following:

What tutors look for when marking essays

17

the ability to construct complex arguments, `weighting'

each section according to its signicance within the overall argument; the ability to provide sophisticated outlines of complex theories; the ability to support arguments with appropriate evidence and examples drawn from a wide range of sources, and to use evidence selectively in a way that supports central points; the ability to evaluate competing positions and the condence to write from `within' a specic perspective or theory on the basis of a reasoned understanding of its strengths and weaknesses; familiarity with, and condence in, handling complex maps, diagrams and numerical data; familiarity with, and condence in, handling original texts and data without relying on commentaries; familiarity with the major social scientic perspectives and social scientic vocabulary, and condence in applying these to specic issues and to new contexts; the ability to pull together different aspects of the course and apply these to the issues raised by a specic essay question; the ability to use appropriate quotations and cite key texts in the eld.

Conclusions are likely to demonstrate:

the ability to present a sophisticated summary of the

essay's core argument; the ability to provide an effective synthesis of the key debates raised by the question, or a sophisticated overview of the state of `current knowledge'; a developed ability to point to absences in the argument or areas worthy of future development.

`Advanced' essays should be fully referenced and written in your own words. The best essays are likely to show a

18

Good Essay Writing

signicant depth of understanding of the issues raised by the question and may show a more creative or original approach (within the constraints of academic rigour).
Different skills, same writer

In thinking about the requirements of different levels of essay writing, it is important to realize that different levels of skills do not come neatly packaged. For instance, you may already have advanced essay writing skills even while working at an introductory level of undergraduate study. Alternatively, you may have advanced skills of analysis (such as the ability to break down a complex argument into its component parts and summarize these effectively), but be struggling with the handling of theoretical concepts and perspectives. Or you may be very effective at your essay introductions, but more shaky when it comes to putting the argument together in the main section. The important point is that what we have set out are indications of what may be expected at different levels across the whole range of abilities, not that you must be able to demonstrate the appropriate level of ability in all cases. Remember, too, that an essay is always greater than its component parts, and it is how you put all those parts together that is often as important as the parts themselves. ...............................................................................
Summary

Essays are graded on extent to which they demonstrate of an understanding of relevant course content and of social scientic and writing skills. The exact mix of content and skills required will depend on the course and question. However, it is possible to specify in general terms what is expected for each grade band. As you become increasingly experienced, you should expect your understanding of social scientic arguments and your writing skills to increase in sophistication.

...............................................................................

222 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

'

Writing

AIMS

By studying and doing the activities in this chapter you should:


N N N N N

understand some of the different kinds of writing students do at university; consider cultural differences in writing styles; learn how arguments can be constructed in writing; understand plagiarism, what it is and how to avoid it; practise how to use information from your reading in your writing, through referencing and bibliography; and develop strategies for editing and proofreading your writing.

GLOSSAR Y GLOSSARY

These key words will be useful to you while reading this chapter: Abbreviation: A shortened word or phrase using only the first letters of each word. Categorical: Without any doubt, certain. Contraction: A shortened form of a word or combination of words. Criteria: Standards by which you judge something. Dissertation: A long piece of writing sometimes done in the last year of a degree. Format: The structure and design of a written document. Indenting: To make a space at the edge of something. Plagiarism: Using another persons idea or a part of a persons work as if it is your own. Priority: Something that is very important. Reputable: Respected and able to be trusted.

W R I T I N G / 223

Substantial: Large in size, value or importance. Suspended: Temporarily not allowed to take part in an activity because you have done something wrong.

Different kinds of writing


At university you will have to write often and in various different formats, depending on the subjects you are studying. The most common form of writing is the essay, but you may also be asked to write reports, case studies, summaries, book reviews and, on some undergraduate programmes, there will be a dissertation in the final year. All pieces of writing will need to be structured in a particular way. Generally you will be given instructions by your tutors on how to structure your writing and what form to use. It is important to be clear about this so, if you are not sure what you should be doing, always ask the tutor. You will generally be expected to wordprocess your writing and you will be told how many words you should write, for example 2,000 to 3,000 words in your first year and sometimes 4,000 to 5,000 in the final year of a degree and about 10,000 words for an undergraduate dissertation. If you do not write enough words you may lose marks and, if you write too many, the tutor may not mark the extra words. All pieces of writing have an introduction and a conclusion with sections of information in between and all except summaries have references or a bibliography at the end.

Task 9.1
Kinds of writing I can do Kinds of writing I enjoy

"

What kind of writing are you familiar with already? Which do you enjoy?

224 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

ESSAYS
The purpose of an essay is to show the tutor that you have done some reading on the topic, have understood and thought about it and are able to explain what you have understood to the reader. Reading is a very important part of essay writing, but it is also essential to explain what you have read in your own words as much as possible to show that you have understood it and applied it to the title. An essay is a way of helping you develop your thoughts and knowledge on a topic. A good essay depends on the following points. What you say It is important that your essay:
N N N N

answers the question (or matches the title); gives information from different sources; gives more than one point of view; and includes some of your own thoughts.

We explain this in more detail below. How you say it It is important that your essay:
N N N N

is well organized; is easy to read; includes good grammar and spelling; and is in a suitable style.

Again, we explain this in more detail below. What you say The most important aim here is to show that you have understood the topic and have done what you were asked to do, including answering the question if there was one. The next most important thing is that you have demonstrated your understanding of the topic or question and that you show you have looked at it from different points of view. The third most important is that you show you have read about the topic and have used your reading in your answer.

W R I T I N G / 225

How you say it Here the most important aim is for the reader to be able to understand the information and your thoughts on it. This means writing in clear sentences and well organized paragraphs. It also means checking your English very carefully to make sure you have used the right words to say what you mean and your grammar and spelling are correct so as not to cause misunderstandings. The structure of an essay An essay is structured into an introduction, a series of paragraphs covering the main points of the essay and a conclusion. The introduction (usually 7 or 8% of the whole essay) should:
N N N

comment on the title of the essay; explain the meaning of any key terms in the title; and explain how you are going to approach the topic.

Each paragraph in an essay follows its own plan, including:


N N N N N

a topic sentence which introduces the main idea; an explanation of the topic sentence; evidence to support what is said in the topic sentence; a comment on the evidence; and a conclusion which explains the implications of the evidence and links the paragraph to the next one.

Signpost
See also the section Argument later in this chapter. The conclusion (1215% of the whole essay) should:
N N N

give a short summary of the main ideas in the essay; refer back to the title and answer any question that was asked; and make some general concluding remarks (you might give your own views here or discuss how the topic relates to wider issues, but you should not introduce any new information).

226 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

In an essay the paragraphs are not numbered. Headings may be used but these are not underlined.

Example: essay titles


Here are some examples of essay titles:
N

Compare and contrast the capital asset pricing model and the weighted average cost of capital as alternative ways of estimating the discount rate to be applied in investment appraisal (financial management). Give a brief account of the importance of the concept of equality and the difference in the thought of radical and liberal feminists (womens studies). Outline the similarities and differences between the organization of leisure and tourism policies of different states within Europe (leisure and tourism management).

REPORTS
Whereas an essay is something you only write at university or college, the report is a form of document which is used in many situations, particularly at work and in government. It is a practical document designed to achieve a task rather than a discussion to explore ideas, and it can be very short, just a few lines, or many volumes long. At university most reports will be between 1,500 and 5,000 words. A report is structured in short, numbered sections. This is so that if a report is being discussed in a meeting people can be directed easily to a particular section (for example, point 3.2, page 4). A long report will start with a separate title page and contents page. A report may also include an appendix or appendices. This is information which is not written by you but which you think will be useful to the reader (for example, a table of statistics to back up your argument, a map or diagram or a section of a text written by another author). Appendices should always be referred to in the main body of the report. Below is an example pro forma for a report that is, a model you can follow but remember that different reports will have different layouts.

W R I T I N G / 227

Example: report pro forma


TITLE AUTHOR DATE 1. INTRODUCTION 1.1 TERMS OF REFERENCE

This report is the result of an investigation into . 1.2 PROCEDURE/METHODOLOGY

In order to investigate the the following procedures were adopted 1.2.1 1.2.2 1.2.3 2. FINDINGS 2.1 2.2 2.3 3. CONCLUSIONS The principal conclusions drawn are as follows: 3.1 3.2 3.3 4. RECOMMENDATIONS The following recommendations are proposed: 4.1 4.2 4.3 5. REFERENCES 6. APPENDICES

228 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

As you can see a report includes an introduction and a conclusion, like an essay, as well as references. However it also includes Terms of reference, which means the reason why the report is being written, and may include Recommendations at the end, which means points for further action. The important skill with a report is to divide the information into short numbered sections in a logical way so that it is easy to understand. The headings of each section are numbered and in capital letters (as in the example) or underlined and, within each section, are subsections, also numbered, as can be seen in the pro forma.

Example: report title


Here is an example of a report title: Read the case study An unmotivated building inspector and, assuming you are the organizations human resource manager, write a report summarizing how motivation theories and practices could help to analyse, manage and improve the situation (business studies).

Task 9.2

"

What is the difference between a report and an essay? What do they look like? How are they structured? What kind of information do they contain? Fill in your answers below: Essay Report

W R I T I N G / 229

Key

For suggested answers, see Key to tasks at the end of this chapter.

LABORATORY (LAB) REPORTS


Science students and researchers have to write regular reports on their lab work. There is a format for these and they are written using the past tense and passive voice (e.g. The mixture was heated). A lab report usually includes the following: 1 2 3 4 5 6 7 8 9 A title. An abstract, summarizing the aim, method and result. An aim the reason for carrying out the experiment. An introduction in which the theoretical background is explained. The method, where you describe the equipment and materials used. This may include diagrams. The procedure, where you describe the steps you followed in carrying out the experiment, usually written in the past passive. The results these are usually presented in the form of tables and graphs, clearly labelled. A discussion here is the place to comment on the results and identify any questions not resolved by the experiment. A conclusion, which describes any conclusions you can draw from the results, being cautious by using phrases such as This evidence suggests that or One interpretation could be that References these will take the same form as references in any other written work.

10

SUMMARIES
A summary is an exercise where the student is required to read a text, such as an article or a chapter, and write down the information from the text in a much shorter form, picking out the main points. The following rules apply when writing a summary:
N

N N

It is written in the students words, not words copied from the text, although you may include a few short quotations. It includes only the most important information from the text. No information which is not in the text can be added.

230 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

K K

It does not include the students opinion on anything. It should be within the word limit asked for.

This is quite a hard piece of writing to do as it is necessary to understand the text very well in order to pick out the important parts and write them in your own words.

Signpost
For an example of a summary, see Chapter 8, Task 8.4.

BOOK REVIEWS
This activity may be used on humanities courses, particularly literature, but students may also be asked to review books or articles on other courses. In a review you are usually asked to comment on a range of aspects of the text you have been asked to read, including the following: The content: what does it say? Is it interesting? Is it new information? The style: is it easy to understand? Is the structure logical? The reader: for whom is it intended? You may be asked to give your opinion in a review.

Task 9.3
Write yes, no or perhaps in the boxes below, as appropriate: Essay You should give your opinion You should write an introduction You should include some quotation You should write a conclusion You should include references Report Summary

"
Book review

W R I T I N G / 231

Key

For suggested answers, see Key to tasks at the end of this chapter.

How to approach a piece of writing


With all pieces of writing it is a good idea to have a plan of action. One way to approach a task is as follows: 1 Analyse the title you must be sure what you have to write about before you begin. 2 Brainstorm this will help you to get ideas about the topic. 3 Make a plan this will help you to find out what information you need. 4 Collect information. 5 Write a draft you will always need to rewrite parts of your work at university. 6 Revise what you have written as many times as necessary. The first priority is to be sure you understand the topic or question in the title, and the form in which it has to be presented (e.g. essay, report, case study, etc.) and the criteria for marking. Once you are sure of this try brainstorming ideas that is, take a large sheet of paper, write your title in the middle, then write down all the ideas that come into your head, all over the sheet. This will help you to get some ideas and to work out how to respond to the topics.

Signpost
For an example of brainstorming, see Chapter 8 page 216 Making notes.

Task 9.4

Choose a title you have been given on one of your courses or use one from the examples above, and brainstorm ideas for it it. Write the title in the middle of a blank piece of paper and then write all your thoughts on it all over the paper.

"

232 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

The next stage is to use the notes you have made of your ideas to work out a plan that is, the order in which you are going to write your ideas, what you are going to include and what you will leave out.

Example: plan
The following is an example of a plan:

Title: The differences between still and moving images Question: Do you need skill to understand an image? Introduction Main body: Introduce images (visual information) Images as language (the medium for communication) Why is the information in images important? 1 Recording information without a human point of view 2 Makes it easy to understand different experiences Still images and moving images as typical information What kind of information can you get from still and moving images? Understanding the information Comparing newspaper pictures and TV news The difference between watching and seeing How effective are they: In conveying knowledge? Quality and quantity? Different types of expectation? How to interpret information Conclusion

W R I T I N G / 233

Task 9.5

"

Make a plan for your piece of writing, using your ideas from Task 9.4. After the plan, the next stage is to do the reading for your piece of writing. The plan will help you to know what you need to read and how much information you need. Only read what you need and make careful notes on your reading, making sure to write down the author, date, title, place of publication and publisher for your references, and page numbers as if you are planning to make a quotation.

Signpost
See Chapter 8 for more information on making notes. When you have finished reading, write the first draft of your work. Then put it aside for at least 24 hours, longer if you can, to rest your brain so that when you reread it you can judge whether it meets the criteria and is relevant to the topic. Finally edit and proofread (see below) your work carefully before submitting it. It is often necessary to reread and edit a piece of work several times.

Writing styles
If you have not studied at university before, you will be learning the writing styles that are used at higher education level. Of course, styles vary from subject to subject, and if you have already studied at university in another country you may find that the English style is different from what you are used to in some general ways, too. There are certain ways of writing that all students writing at university are expected to use, and some basic features of these are as follows.

ACADEMIC STYLE
There are five points to note about style in academic writing: 1 Do not use contractions (use it is instead of its and was not instead of wasnt) or abbreviations (use for example instead of e.g. and that is instead of i.e.).

234 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

2 Do not use colloquial expressions such as As I was saying, As a matter of fact, By the way or Anyway. 3 Use formal vocabulary. 4 Try to avoid using personal pronouns such as I, we, you. 5 Be careful not to make categorical statements. These points are explained in more detail below. Contractions This is a simple rule but one which many students have difficulty remembering. Always use the full form of a word, not a shortened form, in your academic writing. Colloquial language It will probably be difficult for you to know which expressions are colloquial and which are formal when you first start using formal English. This is something you will learn with practice, but it would be a good idea to ask a tutor, a language teacher or a friend who has experience of academic English if you are in doubt about an expression. Formal language The use of more formal vocabulary is something you will gradually get used to while you are at university. You can use a dictionary to help you learn more formal words and expressions, but you should also practise using terminology from your lectures and from your reading, first making sure you understand it.

Signpost
See Chapter 4, Building vocabulary, for ways to extend your vocabulary. Personal pronouns The general rule about using I, we or you does not always apply but you should always check with the tutor before using them. If you have been asked specifically to give your opinion, in a book review for example, it may be appropriate to use I. Otherwise it can be avoided by using forms such as:
K K K

It seems that There is evidence that It can be said that

W R I T I N G / 235

Categorical statements It is not considered appropriate to say without doubt that something is right or wrong, true or false, in academic writing. The convention is to phrase sentences using verbs such as may or might or adverbs such as perhaps or possibly. For example:
N

Do not say: The Blair government is the best one since after the Second World War. Say: The Blair government may be considered to be the most successful Labour government since Attlees post-war administration. Do not say: AIDS came from monkeys. Say: There is a view that AIDS could possibly have been transmitted to humans from apes.

Task 9.6

"
F F F F F I I I I I

Look at the following sentences and indicate by circling either the F or the I whether they are written in formal or informal style: 1 We didnt finish the experiment because we ran out of time 2 Such a proposal would need careful consideration before any funding could be awarded 3 Infection by pathogenic parasites may be a symptom of ill-health 4 If you think globalization is always a good thing then youre wrong 5 In the mid-nineteenth century the average worker clocked up 75 hours of work a week

Key

For suggested answers, see Key to tasks at the end of this chapter.

CULTURAL DIFFERENCES IN WRITING


Different countries have different conventions. For example, Yamuna Kachru (1996) explains that in India it is appropriate to give much broader introductions to a topic and also to discuss more than one topic in a paragraph. The use of language can also be more ornate or flowery.

236 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Cultural differences: example 1


Below is an example of an introduction to an essay on the Dowry system in India: Growing up is a discarding of dreams and a realization of the various facts of life. A general awareness creeps in. It is a process of drinking deep the spring of knowledge and perceiving the different facets of life. Life is a panorama of events, moments of joys and sorrows. The world around us is manifested by both good and evil. Dowry system is one of the prevalent evils of today. Like a diabolic adder it stings the life of many innocent people and is the burning topic of discussion.(Cited in Kachru, 1996) In a British university the first paragraph would not be considered relevant to the topic, and the last sentence comparing the dowry system to an adder would be considered too poetic for an academic essay. Another difference may be when a student adopts a very personal tone, which is almost never considered appropriate in academic writing in the UK.

Cultural differences: example 2


In the following example a Colombian student is writing a book review of Jane Eyre, a famous English novel:
This chapter almost overwhelmed me, but I liked it very much because, although it is fiction, during the time I spent reading it I was transported into another world. The image of Jane Eyre was vivid in my mind. I even nurtured a maternal love for her. Likewise, I created in my imagination the appearance of the other characters in the chapter, their facial expressions, gesticulations and so on.

Although it is sometimes acceptable to use I in a book review, expressions such as overwhelmed, transported into another world and nurtured a maternal love for

W R I T I N G / 237

her would be considered much too dramatic and involved with personal feelings for this context. It is generally not good to try to look for very unusual words in your writing, but to use words that are often used in writing and discussion about the topic. Hinds (1987) says that, whereas in English writing it is the responsibility of the writer to be clear, in Japan the reader has more responsibility for understanding and therefore writers may use more roundabout or circumlocutory ways of expressing their ideas.

Cultural differences: example 3


Here is an example of a Japanese students writing:
It is sometimes said that art works, especially modern art, have several interpretations. If the only correct way of interpreting is the one done by the artist, this is going to be a reason for ordinary people to hesitate about appreciating art, because this idea gives ordinary people the threatening concept that they must study about art works in text books before they go to a gallery to see them. It is more delightful for us to find an interpretation by ourselves. Encountering two or more different works brings about new interpretations just like a chemical reaction.

This paragraph is clear but it takes more effort to understand than English people are used to having to make. We would probably use simpler, more straightforward sentences such as: It is possible to interpret a work of art in several ways. As one Japanese student said (cited in Fox, 1994: 8): Japanese is more vague than English. Its supposed to be that way. You dont say what you mean right away. You dont criticize directly. Kachru (1996) observes that, in many countries, including China and India, it may be considered polite to give a lot of background information which is not related to the topic, because this gives the reader more choices. In writing in Britain it is usual for the writer to argue their point of view rather than leaving the reader a choice. It is considered important in Britain to be clear about what you are saying and to build up to your conclusion point by point, presenting your arguments (see below) and backing them up with evidence:

238 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Style surely can be imitated but the more you imitate the more you lose your own style or your identity. At university I think our writing needs to be accurate, simplified and formal. Hence I should get used to writing this style of English. (Chinese student) Style is also important in other types of writing, such as letters, email messages and notes. Each type of writing has its own appropriate style. It is important to choose the right level of formality in order to communicate effectively and to make the right impression. Sometimes students are unsure of how formal they should be when communicating with their teachers.

Task 9.7

"

Look at this extract from an email sent by a student to his or her tutor. There are some problems with the style. Can you find them and think how you could change the language used?

Dear Michael, Once again, I admit my gratitude to you for your valuable advice and cooperation. Your enthusiasm towards solving my problems and effort to helping me out to get this grant has really enthralled me and therefore I salute your endeavour. So, I entreat you to write few lines from memory. I dont mind if you dash off a passage in the light of my undergraduate study. If you can, then please provide me with your postal address so that I can post the envelope with all documents within to you.

For comments on this task, see Key to tasks at the end of this chapter.

Key

W R I T I N G / 239

Argument
The way a writer convinces the reader of his or her view is through using argument. When we use this word informally it means to disagree. However, in writing at university it means to build up your ideas point by point, considering points which prove the opposite as well as those which support your views, in order to convince the reader to think like you. It may not be necessary to build an argument in all pieces of writing: a report may just be reporting findings or giving explanations; a summary will only be giving a shorter version of an original text. Can you distinguish between an argument and an explanation? In an explanation the writer is giving some information and the reasons behind it but is not trying to convince the reader of anything. For example: Foreign direct investment means a company from one country invests in another. This brings finance into the country which is receiving the investment and expands the business of the investing company. In an argument the writer wants the reader to believe his or her conclusion. This means giving reasons and often evidence. These are called premises. For example: Foreign direct investment creates jobs in the host country and also contributes expertise to the host economy. Therefore countries which are seeking economic growth should welcome it. In the example above the first sentence contains two premises and the second one is a conclusion. This is what the writer wants the reader to believe. Conclusions often start with words or phrases such as therefore, so or as a result, which indicate that the writer is about to say what he or she wants the reader to believe. They may also contain verbs such as must, should or have to.

Task 9.8

"

Can you think of two premises you could use to make someone believe the following conclusion? As a result, in the future we will live in a more and more globalized world.

240 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Key

For suggested answers, see Key to tasks at the end of this chapter. When putting forward an argument you will have more chance of convincing the reader of your point of view if you show you have thought about the opposite point of view and any arguments that might be put forward to support it. These are called counter-arguments. For example: Countries wishing to expand their economies will welcome foreign direct investment; however, investing companies may provide their own staff, limiting the number of jobs available to local people, and a large proportion of profits may be taken out of the host country. However the contribution of expertise by the investor and their investment in property and other local facilities will still make foreign direct investment advantageous for the host country. The writer here shows that some points against foreign direct investment have been considered but also more points in favour are given so that the conclusion is still that it is a good thing. It is important to give convincing reasons for the premises in an argument. You should ask yourself why your reader should believe you. If you are using someone elses ideas to support your argument, check that that person is reputable and what you have said is true and represents accurately what he or she has said.

Plagiarism
Plagiarism is a concept based on the Western idea that, when somebody writes something, it belongs to him or her and not to anyone else. So if you write an essay it is yours and, although other people can read it, they must not copy it. In the same way if you read a book, newspaper article, web page or any other text you cannot use the exact same words in your writing unless you say where you copied them from and use quotation marks. This is why we use references and bibliographies. The rule in all UK universities is that students must use their own words in their writing. You may use short (maximum 810 lines) quotation from books, websites and

W R I T I N G / 241

other sources and you can use the ideas from these sources written in your own words but you must always give a reference in your text and list all details of the source at the end.

Signpost
How to do this is explained later in this chapter, in the section Referencing and bibliography. In other cultures plagiarism is not viewed in this way. For example: Actually, the concept of plagiarism is not very familiar to us Chinese students due to the different teaching method and education attitude to an extent. We were encouraged in high school and in college to use more what famous people said before and what we learnt on the textbooks but we were not asked to note the name of the person and from which book we copied the words. (Chinese student) I never knew how strict the UK is about plagiarism until we spoke about this in class. This would be a major adjustment for me with writing essays, dissertations and projects for I have to keep in mind what I write. In the Philippines, rules on plagiarism are not that tough. I usually write long essays without giving references. (Filipino student)

Task 9.9

"

In your country, how were you taught to use information from books and other sources? Did you learn to use quotations and references or did you do it differently? Write down what you did.

WHAT HAPPENS IF YOU PLAGIARIZE?


There are quite severe punishments for plagiarism. This means that if the lecturer marking your work thinks that substantial sections of it are copied from books, the web or other sources and you have not given references, you may be given a mark of zero and fail your course. If you do this more than once, you may be suspended from the university.

242 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

HOW CAN THE TUTOR KNOW IF A STUDENT HAS PLAGIARIZED?


Students from other countries usually use a different variety of English. In addition, if they are learning English, they often make mistakes of a particular kind, such as verb tenses and prepositions. Because of this, it is sometimes easy for the reader to tell that some parts of a piece of writing have been written by the student and some parts have been copied.

Example: plagiarism
Look at the example below, which shows two paragraphs in a students essay:

Go shopping is one part of peoples life; people is still following the traditional department store. Nowadays the variety of developed rapidly, there are so many ways the most popular one is shopping online.

the majority of way to go to the technology is to shopping and

Online shopping offers consumers a vast array of goods and services from companies around the world. You may be able to get things that are not available locally and you may even pay less than you normally would in conventional shops. The first part of the writing has some English mistakes (italicized) and, from the sentence structure, it seems that it has been written by a person who is learning English. In the second paragraph the English is perfect and the style is that of an English textbook. It looks as if the writer has copied the second paragraph from a textbook. As there is no reference given, this piece of work looks like an example of plagiarism. When students download essays or parts of essays from the web, the tutor can easily find out by entering one or two sentences from the essay on a search engine like Google. Google will find the original text and the tutor will be able to see how much the student has copied. If a student copies a piece of work or part of a piece of work belonging to another student and the tutor finds out, both students will be punished equally.

W R I T I N G / 243

HOW TO AVOID PLAGIARISM


The best way to avoid plagiarism is to plan your writing, read the material you need to get the information from and make notes as you read. Then write your essay or report using your notes and without looking at the original texts you read. Of course, this is more difficult if you are learning English but it will help you develop your own ideas and arguments and your writing skills. Using books or other sources by copying bits of text and just changing a few words is not acceptable.

Example: avoiding plagiarism


Look at the text below, from a newspaper, and how two students used it. The first one has plagiarized because he or she used too many of the same phrases; the second has expressed the ideas in his or her own words: Original text The Royal Commission on Environmental Pollution told the Transport Secretary, Alistair Darling, that his expansion policy was deeply flawed. A scathing report said ministers showed little sign of having recognised the atmospheric damage caused by aircraft. It recommended a freeze on airport expansion, together with a tax of between 40 and 100 on every ticket, which would double the price of many journeys. Sir Tom Blundell, the commissions chairman, said: We believe we should restrict airport development, rather than just expand in response to demand. (Guardian Unlimited, Guardian Newspapers Ltd 2002, accessed December 2002) Student 1 A scathing report believes ministers think the atmospheric damage is not mostly caused by aircraft, and recommended a slow down on airport expansion and a tax of between 40 and 100 on every single ticket. Student 2 In the excerpts from the Guardian newspaper article (December 2002) there are different views on putting tax on flights. According to Sir Tom

244 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Blundell, the commissions chairman, we must stop responding to the airport authorities about their developing programs. At the same time the commission also announced a tax of between 40 and 100 on every ticket, which means passengers will have to pay double the amount of money for their flights. If we compare the original text from the Guardian newspaper, we can see that student 1 has taken a lot of phrases from the original text (italicized) and used them without quotation marks. This is plagiarism. Student 2s summary, which uses his or her own words together with one quotation, signalled by quotation marks, is not plagiarism.

Task 9.10

"

Now read part of an article in a newspaper, journal or on the web, or a chapter in a book, make notes on it, put it away and then write a short paragraph on it without looking at the original. Make sure you use mainly your own words and put quotation marks round any phrases you use from the original text. If you can, ask a teacher to look at it.

Referencing and bibliography


This is a very important part of writing at university in the UK and it is difficult to get it right. Many English students find it hard to learn correct referencing. Referencing is important because it is the way you show that you have read enough material about your topic and have related your reading to the title of your work. Normally for an essay or report, you should have read four or five book chapters or articles in journals or on the web and all these should be referred to in your writing. The method of referencing we are going to describe here is widely used and is called the Harvard system. Some lecturers like students to use different methods. When you are given instructions for an assignment you will normally be told what method of referencing to use, but it is always a good idea to clarify with the lecturer what is expected.

W R I T I N G / 245

REFERENCING WITHIN YOUR TEXT


In the Harvard method there are four ways of referring to an author or text: The following examples use information taken from an article in the Daily Telegraph newspaper on 2 April 1989, written by April Robinson, a policewoman: 1 Describe the idea or point that you are using from the source, without mentioning the author in your sentence, then put the author s name and the date of publication in brackets at the end of the sentence: Nowadays nobody is surprised when police officers are attacked (Robinson, 1989). 2. Use the author s name, followed by the date of publication in brackets in your sentence, when describing his or her ideas in your own words: Robinson (1989) believes that there is more and more violence in our society. This can be done in a number of other ways, for example: As Robinson (1989) points out , According to Robinson (1989) , To quote from Robinson (1989) , Writing in the Daily Telegraph, Robinson (1989) explains that or Writing in 1989, Robinson argues that 3. Use some of the author s own words in your sentence, putting these in quotation marks (also called inverted commas) including the author s name and the date of publication in one of the two ways above: Robinson (1989) says that she really objects to people using the street as a rubbish bin. 4. Use a long quotation from the source, but not more than 68 lines maximum, indenting this and using the author s name in one of the ways described in (1) and (2) above. When a quotation is indented it is not necessary to use quotation marks: Another thing that seems to have changed is peoples attitudes to the things they own or want to own: They seem to put more emphasis on material possessions than they do on human values. Material things have become too important. (Robinson, 1989) If you are quoting from a source on the web, you should give the author s name and date as above if they are given on the site. If no author is given, you should give the title of the article and the date.

246 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Task 9.11

"

Read a short article in a newspaper or on the web and then write sentences using information from the article in the four different ways shown above. All the authors and websites mentioned in your writing have to be listed with full details in your references at the end of your work. We call this list references when it only includes texts you have mentioned. If you make a list including other texts you have read in connection with the work but have not mentioned, the list is called bibliography.

REFERENCES AND BIBLIOGRAPHY


The Harvard system for listing references and bibliography is as follows. For a book Author s surname, initial(s), date of publication in brackets, title in italics, place of publication, publisher. For example: Fanon, F. (1986) Black Skin, White Masks. London: Pluto Press. For a chapter in an edited book Chapter author s surname, initial(s), date in brackets, title of the chapter in quotation marks , book editor s initial(s) and surname, (ed.) or (eds), title of book in italics, place of publication, publisher. For example: Castells, M. (2000) Information technology and global capitalism, in W. Hutton and A. Giddens, (eds) On the Edge: Living with Global Capitalism. London: Jonathan Cape. For a paper in a journal Paper author s surname, initial(s), date in brackets, title of the paper in quotation marks , name of journal (in italics), volume and issue numbers, pages of paper. For example: Fang, Y. (2001) Reporting the same events? A critical analysis of Chinese print news media texts, Discourse and Society, Vol. 12, no. 5, pp. 585613.

W R I T I N G / 247

For a book, article or any other document on the web The same rules as above apply but the web address and date the page was accessed are added. For example: Gilligan, E. (1998) Local Heroes [online], Friends of the Earth, http:www.foe.co.uk/local/ rest.pdf (accessed 24 November 1998). If there is no author given write the title of the article, date and web address. For example: Media in Romania (1998) http://www.dds.nl/pressnow/dossier/romania.html The full address of the page where the information was found should always be given. The list of references must be in alphabetical order by surname of the authors or title where there is no author.

Task 9.12

"

Using the books and articles you have collected to write an essay or report, write a list of references following the examples above, and making sure they are in alphabetical order. Try to include different kinds of references (for example, a book, a paper in a journal, an article from a website).

EDITING AND PROOFREADING


Editing and proofreading are the tasks you have to do when you have written your assignment and are ready to examine it again to make sure it is as good as possible before you hand it in. A good strategy is to write your essay, report or chapter, put it away for at least 24 hours, or a few days if possible, then read it again when you are rested and your brain is fresh. In this way you will be able to see if any changes are needed.

248 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

EDITING
Editing means checking your work to be sure that you have said what you wanted to say, including all the important information, and have done what you were asked to do. You can ask yourself the following questions:
N N N N N N N N

Is all the necessary information included? Is it all relevant to the title? Have I made my arguments clear? Have I given evidence by using examples and/or references to my sources? Do my paragraphs follow each other in a logical order? Have I explained what I am discussing in the introduction? Have I summed up and commented on the whole in the conclusion? Is my work too long or too short?

If you do not feel confident about your work you may be able to get help from tutors who specialize in writing at your university, but mainly your writing will improve with practice. However, even very experienced writers spend a lot of time editing and usually write several drafts of their work, so be prepared to rewrite your work as often as necessary.

Task 9.13

"

Reread a piece of writing you have done recently asking yourself the questions listed above.

PROOFREADING
When you have finished editing, you need to start proofreading. This is where you read each sentence carefully to check the details such as grammar, spelling, punctuation and vocabulary. Spelling You will probably be using a computer to write your assignments and will be using spell check. However, you must remember that the computer will only tell you if the word as you have spelt it does not exist. It will not tell you if you have used the wrong word for that sentence. For example to, too and two are all correct words but only one is right in this sentence:

W R I T I N G / 249

It is possible to place too much importance on testing in primary schools (not to or two). Grammar You will need to check whether you have used the right tense (past, present, future) and whether your verb and subject agree. If you have used a singular subject have you matched it with a singular verb? For example: The drops of water were collected in a filter dish (not: The drops of water was collected because the subject is the drops not water ). You will also need to check that you have used articles (the, a and an) and prepositions (of , to, from, etc.) correctly. A word of warning: do not rely on the grammar check tool that comes as part of some popular wordprocessing packages. Frequently the advice it gives is totally wrong! It will tell you that a perfectly well formed sentence should be changed and offer an alternative which is incorrect English! Punctuation Are your sentences very long, or are they only half sentences? Have you used full stops, capital letters and commas where necessary? Have you put question marks after questions and quotation marks where you have used someone elses words? Vocabulary Have you used the right word for what you want to say? Does it have the right meaning in this context? Is the word formal enough or is it colloquial?

Signpost
See Chapter 4, Building vocabulary, for help with this. Paragraphing Paragraphs should be between about 10 lines and three quarters of a page and should have one main topic.

250 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Proofreading can be difficult if you are not sure about some aspects of English. A good learner s dictionary of English can be a great help.

Signpost
See Chapter 4, Building vocabulary for more information on using dictionaries. Proofreading is more difficult, especially for we foreign students, because it is hard for us to proofread our writing in terms of the language of English. Correcting is achievable but perfection is unreachable. (Chinese student). A useful strategy is to be aware of your own particular weaknesses and to concentrate on them especially. For example, if you know you usually write very long sentences, check your sentences carefully and make them shorter if they are too long. If verb tenses are difficult for you, double check the verb forms you have used in your work. You could use a checklist like the one below to go through your work and to help you find out where you are making the most errors. Fill in a column for each piece of work you do, and add any extra categories of error in the empty spaces on the left: CHECKLIST: proofreading Type of error Sentence length Commas Capital letters Verb tenses Agreement Articles Prepositions Spelling Paragraphs Coursework 1 Coursework 2 Coursework 3 Coursework 4

W R I T I N G / 251

Task 9.14

"

Try proofreading this paragraph, written by a Business Studies undergraduate. You should find 20 mistakes: The effect of tariffs on consumers When a government of an importing nation imposes a tariff on imported goods, the consumer in the domestic market will reduce their consumption of that goods, as a result of the increases in the price of that goods (because tariff charges is passed into consumer), and in the case of the good been essential for the consumer, the consumption would not change very much, but the price will raise to include the tariff. In order to avoid demand collapsing altogether, suppliers have to absorb part of the tariff themselves, only if demand is totally inelastic will supplier be able to pass the entire Tariff on to the consumer. Therefore the extend to which the Tariff is successful in reducing demand for The imported products depend largely on the elasticity of demand for that product. When Tariff imposed by a government on their imported product tariff tends to make the customer worse off.

For suggested answers, see Key to tasks at the end of this chapter.

Feedback
After all your hard work in preparing and writing a piece of work, once you hand it in, you may feel the work is finished. However, perhaps the most important learning can be done when you receive your work back and receive feedback from your teacher.

Key

252 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

Always hand in your work on time, otherwise it may not be marked, and when it is returned to you, read the tutor s comments carefully. If you are lucky enough to have a tutor who gives you detailed feedback, you can learn a lot about your strengths and weaknesses in writing and about how far you manage to meet the requirements of your university. If your tutor has not given you enough information for you to understand why you got the mark you did, contact him or her and ask for more feedback. Remember, it is as important to understand what you did well as it is to understand anything you may have done less well. This is one of the most important ways for you to improve, because it has been shown that good quality feedback, which is then acted upon by the learner, leads to exceptional progress. Your tutors are a valuable source of information about the standards you need to reach and they are there to help you.

Conclusion
Writing is something we continue to develop throughout our lives, at university, at work and perhaps in our hobbies. It can be hard work writing in a language which is not your first language or the language you learnt at school, but the only answer is to read and write as much as possible and keep revising and improving. As one Taiwanese student said: I not only write daily but also make notes and remember what is important and what I have learned. Meanwhile I can improve my writing by reading. I have to think what kind of book will help me and be relevant to my subject. I have to do it and encourage myself immediately. With practice and persistence, you will find you can make great progress and you will discover that what you learn by writing will help you in many other aspects of your studies.

REFERENCES
Fox, H. (1994) Listening to the World: Cultural Issues in Academic Writing. Urbana, IL: National Council of Teachers of English (NCTE). Hinds, J. (1987) Reader versus writer responsibility: a new typology, in U. Connor and R. Kaplan (eds) Writing Across Languages: Analysis of L2 Text. Reading, MA: Addison-Wesley.

W R I T I N G / 253

Kachru, Y. (1996) Culture in rhetorical styles: contrastive rhetoric and world Englishes, in N. Mercer and J. Swan (eds) Learning English: Development and Diversity. London: Routledge.

USEFUL RESOURCES
http://www.uefap.com/writing/writfram.htm A good website on English for academic purposes. http://www.macmillandictionary.com/MED/08-language awareness-academicUK.htm This site is a magazine, and numbers 08 and 09 look at aspects of academic writing. http:www.ipl.org/div/aplus/linkswritingstyle.htm#logic This site covers writing style and techniques. http://www.urich.edu/~writing/argument/html This site discusses how you can make effective arguments. http://www.keele.ac.uk/depts/aa/handbook/section2/plagiarismanddishonesty.html A very clear site about plagiarism. http://www.hamilton.edu/academics/resource/wc/AvoidingPlagiarism.html A site which tells you how to avoid plagiarism. Crme, P. and Lea, M. (1997) Writing at University: A Guide for students. Buckingham: Open University Press. Fairbairn, G. and Winch, C. (1996) Reading, Writing and Reasoning. Buckingham: SRHE and Open University Press. Jordan, R.R. (1998) Academic Writing Course. Harlow: Longman. McCarter, S. (1997) A Book on Writing. Ford, Midlothian: IntelliGene. Rees, A. Guide to assignment writing for international students, available at www.bowbridgepublishing.com. Swales, J.M. and Feak, C.B. (1994) Academic Writing for Graduate Students: Essential Tasks and Skills. Ann Arbor, MI: University of Michigan Press. Williams, K. (1995) Writing Essays. Oxford: Oxford Centre for Staff Development. Williams, K. (1995) Writing Reports. Oxford: Oxford Centre for Staff Development.

254 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

KEY TO TASKS
Task 9.2 Essay K Discusses a topic. K Gives different points of view. K Puts forward arguments. K Is divided into paragraphs but does not include headings, numbers, underlining or bold. Task 9.3 Essay Report Summary Book review Perhaps Yes Report K Gives an account of a situation. K Makes recommendations. K Seeks to inform rather than discuss. K Is divided into numbered sections with headings, underlining, capitals and bold.

You should give your opinion You should write an introduction You should include some quotations You should write a conclusion You should include references

Yes Yes

Perhaps Yes

No Perhaps

Yes Yes Yes

Perhaps Yes Yes

Perhaps Perhaps Yes

Perhaps Yes Yes

Task 9.6 1 2 3 4 5 Informal use of we didnt, we ran out of time. Formal use of Such a proposal, would, funding, could be. Formal vocabulary, use of may be. Informal use of If you, then youre wrong. Formal but not entirely, the expression clocked up is informal.

W R I T I N G / 255

Task 9.7 Dear Michael, Once again, I admit my gratitude to you for your valuable advice and co-operation. Your enthusiasm towards solving my problems and effort to helping me out to get this grant has really enthralled me and therefore I salute your endeavour. So, I entreat you to write few lines from memory. I dont mind if you dash off a passage in the light of my undergraduate study. If you can, then please provide me with your postal address so that I can post the envelope with all documents within to you. The problem with this email is that, although the student intends to be friendly and quite informal, much of the language is rather formal and old fashioned and so the tone is odd. Only the phrase dash off is quite informal and in fact is not really suitable when talking about a reference, as it implies carelessness. The phrases highlighted in italics do not work well. More suitable phrases are suggested below: Dear Michael, Once again, thanks so much for your valuable advice and co-operation. I really appreciate your willingness to help me out. So, could you write a few lines from memory? I dont mind if you just write a short reference based on my undergraduate study. If you can, then please email me your postal address so that I can post the envelope with all documents to you. Task 9.8 Some possible premises:
K K K K K

Communication over long distances is becoming easier all the time. Trade agreements are facilitating business between different countries. The spread of information is very rapid nowadays. More and more people around the world are speaking English. Multinationals dominate the market in many sectors.

Task 9.14 The effect of tariffs on consumers When a government of an importing nation imposes a tariff on imported goods, the

256 /

T H E

I N T E R N A T I O N A L

S T U D E N T S

G U I D E

consumer in the domestic market will reduce their consumption of that goods, as a result of the increases in the price of that goods (because tariff charges is passed into consumer), and in the case of the good been essential for the consumer, the consumption would not change very much, but the price will raise to include the tariff. In order to avoid demand collapsing altogether, suppliers have to absorb part of the tariff themselves, only if demand is totally inelastic will supplier be able to pass the entire Tariff on to the consumer. Therefore the extend to which the Tariff is successful in reducing demand for The imported products depend largely on the elasticity of demand for that product. When tariff imposed by a government on their imported product tariff tends to make the consumer worse off. The mistakes have been marked in bold. The insertion mark indicates where something is missing: 1 those not that goods is plural 2 full stop, new sentence 3 no s singular 4 not necessary delete 5 delete brackets and put commas instead 6 are not is plural 7 on to not into 8 the needed 9 goods 10 being not been 11 rise not raise 12 full stop, new sentence 13 the needed 14 no capital letter 15 extent (noun) not extend (verb) 16 no capital letters 17 depends singular subject 18 a needed 19 is needed 20 the needed

W R I T I N G / 257

Below is a correct version of the text: The effect of tariffs on consumers When a government of an importing nation imposes a tariff on imported goods, the consumer in the domestic market will reduce their consumption of those goods. As a result of the increase in the price, because tariff charges are passed on to the consumer, and in the case of the goods being essential for the consumer, the consumption would not change very much, but the price will rise to include the tariff. In order to avoid demand collapsing altogether, suppliers have to absorb part of the tariff themselves. Only if demand is totally inelastic will the supplier be able to pass the entire tariff on to the consumer. Therefore the extent to which the tariff is successful in reducing demand for the imported products depends largely on the elasticity of demand for that product. When a tariff is imposed by a government on their imported product, the tariff tends to make the consumer worse off.

Вам также может понравиться