Вы находитесь на странице: 1из 403

Additional DSUS Material

Chapter 1

Self-Test Answers

Based on what you have read in this section, what qualities do you think a scientific theory should have?
A good theory should do the following: 1. Explain the existing data. 2. Explain a range of related observations. 3. Allow statements to be made about the state of the world. 4. Allow predictions about the future. 5. Have implications.

What is the difference between reliability and validity?

Validity is whether an instrument measures what it was designed to measure, whereas reliability is the ability of the instrument to produce the same results under the same conditions.

13

Why is randomization important?

It is important because it rules out confounding variables (factors that could influence the outcome variable other than the factor in which youre interested). For example, with groups of people, random allocation of people to groups should mean that factors such as intelligence, age, gender and so on are roughly equal in each group and so will not systematically affect the results of the experiment.

Compute the mean but excluding the score of 252.


The range is the lowest score (22) subtracted from the highest score (now 121), which gives us 12122 = 99.

First, we first add up all of the scores:

x
i =1

= 22 + 40 + 53 + 57 + 93 + 98 + 103 + 108 + 116 + 121 = 811

We then divide by the number of scores (in this case 11):

X=

x
i =1

811 = 81.1 10

14

The mean is 81.1 friends.

Twenty-one heavy smokers were put on a treadmill at the fastest setting. The time in seconds was measured until they fell off from exhaustion: 18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57. Compute the mode, median, upper and lower quartiles, range and interquartile range.
First, lets arrange the scores in ascending order: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57. The Mode: The scores with frequencies in brackets are: 16 (1), 18 (2), 22 (2), 23 (2), 24 (1), 26 (1), 29 (1), 32 (1), 34 (2), 36 (2), 42 (1), 43 (1), 46 (2), 49 (1), 57 (1). Therefore, there are several modes because 18, 22, 23, 34, 36 and 46 seconds all have frequencies of 2, and 2 is the largest frequency. These data are multimodal (and the mode is, therefore, not particularly helpful to us). The Median: The median will be the (n + 1)/2 score. There are 21 scores, so this will be 22/2 = 11. The 11th score in our ordered list is 32 seconds. The Mean: The mean is 32.19 seconds:

X=

x
i =1

n 16 + 18 + 22 + 22 + 23 + 23 + 24 + 26 + 29 + 32 + 34 + 36 + 36 + 42 + 43 + 46 + 46 + 49 + 57 = 21 676 = 21 = 32.19

15

The Lower Quartile: This is the median of the lower half of scores. If we split the data at 32 (not including this score), there are 10 scores below this value. The median of 10 = 11/2 = 5.5th score. Therefore, we take the average of the 5th score and the 6th score. The 5th score is 22 and the 6th is 23; the lower quartile is therefore 22.5 seconds. The Upper Quartile: This is the median of the upper half of scores. If we split the data at 32 (not including this score), there are 10 scores above this value. The median of 10 = 11/2 = 5.5th score above the median. Therefore, we take the average of the 5th score above the median and the 6th score above the median. The 5th score above the median is 42 and the 6th is 43; the upper quartile is therefore 42.5 seconds. The Range; This is the highest score (57) minus the lowest (18), i.e. 39 seconds. The Interquartile Range: This is the difference between the upper and lower quartile: 42.522.5 = 20.

Assuming the same mean and standard deviation for the Beachy Head example above, whats the probability that someone who threw themselves off of Beachy Head was 30 or younger?

As in the example, we know that the mean of the suicide scores was 36, and the standard deviation 13. First we convert our value to a z-score: the 30 becomes (3036)/13 = 0.46. We want the area below this value (because 30 is below the mean), but this value is not tabulated in the Appendix. However, because the distribution is symmetrical, we could instead ignore the minus sign and look up this value in the column labelled Smaller Portion (i.e. the area above the value 0.46). You should find that the probability is .32276, or, put another way, a 32.28% chance that a suicide victim would be 30 years old or younger. By looking at the column labelled Bigger Portion we can also see the probability

16

that a suicide victim was aged 30 or more! This probability is .67724, or theres a 67.72% chance that a suicide victim was older than 30 years old!

Chapter 2

Self-Test Answers

We came across some data about the number of friends that 11 people had on Facebook (22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252). We calculated the mean for these data as 96.64. Now calculate the sums of squares, variance and standard deviation.

To calculate the sum of squares, take the mean from each value, then square this difference. Finally, add up these squared values:

So, the sum of squared errors is a massive 37544.55. The variance is the sum of squared errors divided by the degrees of freedom (N 1). There were 11 scores and so the degrees of freedom were 10. The variance is, therefore, 37544.55/10 = 3754.45.

17

Finally, the standard deviation is the square root of the variance:

3754.45 = 61.27.

Calculate these values again but excluding the outlier (252).

To calculate the sum of squares, take the mean from each value (note that it has changed because the outlier is excluded), then square this difference. Finally, add up these squared values:

So, the sum of squared errors is 10992.90. The variance is the sum of squared errors divided by the degrees of freedom (N 1). There were 10 scores and so the degrees of freedom were 9. The variance is, therefore, 10992.90/9 = 1221.43. Finally, the standard deviation is the square root of the variance:

1221.43 = 34.95.

Note then that like the mean itself the standard deviation is hugely influenced by outliers: the removal of this one value has halved the standard deviation!

18

We came across some data about the number of friends that 11 people had on Facebook. We calculated the mean for these data as 96.64 and standard deviation as 61.27. Calculate a 95% confidence interval for this mean.
First we need to calculate the standard error,

X =

s N

61.27 11

= 18.47

The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, N 1. With 11 data points, the degrees of freedom are 10. For a 95% confidence interval we can look up the value in the column labelled Two-Tailed Test, 0.05 in the table of critical values of the tdistribution (Appendix). The corresponding value is 2.23. The confidence intervals are, therefore:

Lower Boundary of Confidence Interval = X (2.23 SE) = 96.64 (2.23 18.47) = 55.63
Upper Boundary of Confidence Interval = X (2.23 SE) = 96.64 + (2.23 18.47) =137.65

Recalculate the confidence interval assuming that the sample size was 56.

First we need to calculate the new standard error,

X =

s N

61.27 56

= 8.19

The sample is big now, so to calculate the confidence interval we can use the critical value of z for a 95% confidence interval (i.e. 1.96). The confidence intervals are therefore:

Lower Boundary of Confidence Interval = X (1.96 SE) = 96.64 (1.96 8.19) = 80.59 Upper Boundary of Confidence Interval = X (1.96 SE) = 96.64 + (1.96 8.19) = 112.69

19

What are the null and alternative hypotheses for the following questions:

Is there a relationship between the amount of gibberish that people speak and the amount of vodka jelly theyve eaten?
o Null Hypothesis: There will be no relationship between the amount of gibberish that

people speak and the amount of vodka jelly theyve eaten.


o Null Hypothesis: There will be a relationship between the amount of gibberish that

people speak and the amount of vodka jelly theyve eaten. Is the mean amount of chocolate eaten higher when writing statistics books then when not?
o Null Hypothesis: There will be no difference in the mean amount of chocolate eaten

when writing statistics textbooks compared to when not writing them.


o Alternative Hypothesis: The mean amount of chocolate eaten when writing statistics

textbooks will be higher than when not writing them.

Chapter 3

Self-Test Answers

Why is the number of friends variable a scale variable?

20

It is a scale variable because the numbers represent consistent intervals and ratios along the measurement scale: the difference between having (for example) 1 and 2 friends is the same as the difference between having (for example) 10 and 11 friends, and (for example) 20 friends is twice as many as 10.

Having created the first four variables with a bit of guidance, try to enter the rest of the variables in Table 2.1 yourself.
The finished data and variable views should look like those in the figure below (more or less!). You can also download this data file (Data with which to play.sav).

21

Chapter 4

Self-Test Answers

What does a histogram show?

A histogram is a graph in which values of observations are plotted on the horizontal axis, and the frequency with which each value occurs in the data set is plotted on the vertical axis.

Produce boxplots for the day 2 and day 3 hygiene scores and interpret them.
The boxplot for the day 2 data should look like this:

22

Note that as for day 1 the females are slightly more fragrant than males (look at the median line). However, if you compare these to the day 1 boxplots (in the book) scores are getting lower (i.e. people are getting less hygienic). In the males there are now more outliers (i.e. a rebellious few who have maintained their sanitary standards). The boxplot for the day 3 data should look like this:

Note that compared to day 1 and day 2 the females are getting more like the males (i.e. smelly). However, if you look at the top whisker, this is much longer for the females. In other words, the top 25% of females are more variable in how smelly they are compared to males. Also, the top score is higher than for males. So, at the top end females are better at maintaining their hygiene at the festival compared to males. Also, the box is longer for females, and although both boxes start at the same score, the top edge of the box is higher in females, again suggesting that above the median score more women are achieving higher levels of hygiene than men. Finally, note that for both days 1 and 2, the boxplots have become less symmetrical (the top whiskers are longer than the bottom whiskers). On day 1 (see the book chapter), which is symmetrical, the

23

whiskers on either side of the box are of equal length (the range of the top and bottom 25% of scores is the same); however, on days 2 and 3 the whisker coming out of the top of the box is longer than that at the bottom, which shows that the distribution is skewed (i.e. the top 25% of scores is spread out over a wider range than the bottom 25%).

Use what you learnt earlier to add error bars to this graph and to label both the x- (I suggest Time) and y-axis (I suggest Mean Grammar Score (%)).

Simple Line Charts for Independent Means


To begin with, imagine that a film company director was interested in whether there was really such a thing as a chick flick (a film that typically appeals to women more than men). He took 20 men and 20 women and showed half of each sample a film that was supposed to be a chick flick (Bridget Jones Diary), and the other half of each sample a film that didnt fall into the category of chick flick (Memento, a brilliant film by the way). In all cases he measured their physiological arousal as a measure of how much they enjoyed the film. The data are in a file called ChickFlick.sav on the companion website. Load this file now. First of all, lets just plot the mean rating of the two films. We have just one grouping variable (the film) and one outcome (the arousal); therefore, we want a simple line chart. Therefore, in the Chart Builder double-click on the icon for a simple line chart. On the canvas you will see a graph and two drop zones: one for the y-axis and one for the x-axis. The y-axis needs to be the dependent variable, or the thing youve measured, or more simply the thing for which you want to display the mean. In this case it would be arousal,

24

so select arousal from the variable list and drag it into the y-axis drop zone (

).

The x-axis should be the variable by which we want to split the arousal data. To plot the means for the two films, select the variable film from the variable list and drag it into the drop zone for the x-axis ( ).

Dialog boxes for a simple line chart with error bar

The figure above shows some other options for the line chart. The main dialog box should appear when you select the type of graph you want, but if it doesnt click on in the

Chart Builder. There are three important features of this dialog box. The first is that, by default, the lines will display the mean value. This is fine, but just note that you can plot other summary statistics such as the median or mode. Second, you can adjust the form of

25

the line that you plot. The default is a straight line, but you can have others like a spline (curved line). Finally, we can ask SPSS to add error bars to our line chart by selecting . We have a choice of what our error bars represent. Normally, error bars show the 95% confidence interval, and I have selected this option ( ). Note, though,

that you can change the width of the confidence interval displayed by changing the 95 to a different value. You can also display the standard error (the default is to show two standard errors, but you can change this to one) or standard deviation (again, the default is two but this could be changed to one or another value). Its important that when you change these properties that you click on applied to Chart Builder. Click on : if you dont then the changes will not be

to produce the graph.

Line chart of the mean arousal for each of the two films The resulting line chart displays the mean (and the confidence interval of those means). This graph shows us that, on average, people were more aroused by Memento than they were by Bridget Jones Diary. However, we originally wanted to look for gender effects, so this graph isnt really telling us what we need to know. The graph we need is a multiple line graph.

26

Multiple Line Charts for Independent Means


To do a multiple line chart for means that are independent (i.e. have come from different groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the book chapter). On the canvas you will see a graph as with the simple line chart but there is now an extra drop zone: . All we need to do is to drag our second grouping

variable into this drop zone. As with the previous example then, select arousal from the variable list and drag it into , select film from the variable list and drag it into

. In addition, though, we can now select the gender variable and drag it into . This will mean that lines representing males and females will be displayed in different colours. As in the previous section, select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph.

27

Dialog boxes for a multiple line chart with error bars

28

Line chart of the mean arousal for each of the two films The resulting line chart tells us the same as the simple line graph: that is, arousal was overall higher for Memento than Bridget Jones Diary, but it also splits this information by gender. Look first at the mean arousal for Bridget Jones Diary; this shows that males were actually more aroused during this film than females. This indicates they enjoyed the film more than the women did! Contrast this with Memento, for which arousal levels are comparable in males and females. On the face of it, this contradicts the idea of a chick flick: it actually seems that men enjoy chick flicks more than the chicks (probably because its the only help we get to understand the complex workings of the female mind!).

Simple Line Charts for Related Means


Hiccups can be a serious problem. Charles Osborne apparently got a case of hiccups while slaughtering a hog (well, who wouldnt?) that lasted 67 years. People have many methods for stopping hiccups (a surprise, holding your breath), but actually medical science has put its collective mind to the task too. The official treatment methods include tonguepulling manoeuvres, massage of the carotid artery and, believe it or not, digital rectal massage (Fesmire, 1988). I dont know the details of what the digital rectal massage involved, but I can probably imagine. Lets say we wanted to put this to the test. We took 15 hiccup sufferers, and during a bout of hiccups administered each of the three procedures (in random order and at 5 minute intervals) after taking a baseline of how many hiccups they had per minute. We counted the number of hiccups in the minute after each procedure. Load the file Hiccups.sav. Note that these data are laid out in different columns; there is no grouping variable that specifies the interventions because each patient experienced all interventions. In the previous two examples we have used grouping variables to specify aspects of the graph (e.g. we used the grouping variable film to specify the x-axis). For

29

repeated-measures data we will not have these grouping variables and so the process of building a graph is a little more complicated (but not a lot more). To plot the mean number of hiccups go to the Chart Builder and double-click on the icon for a simple line chart. As before, you will see a graph on the canvas with drop zones for the x- and y-axes. Previously we specified the column in our data that contained data from out outcome measure on the y-axis, but for these data we have four columns containing data on the number of hiccups (the outcome variable). What we have to do then is to drag all four of these variables from the variable list into the y-axis drop zone. We have to do this simultaneously. To do this we first need to select multiple items in the variable list: to do this select the first variable by clicking on it with the mouse. The variable will be highlighted in blue. Now, hold down the Ctrl key on the keyboard and click on a second variable. Both variables are now highlighted in blue. Again, hold down the Ctrl key and click on a third variable in the variable list and so on for the fourth. In cases in which you want to select a list of consecutive variables, You can do this very quickly by simply clicking on the first variable that we want to select (in this case baseline), then hold down the Shift key on the keyboard and then click on the last variable that you want to select (in this case digital rectal massage); notice that all of the variables in between have been selected too. Once the four variables are selected you can drag them by clicking on any one of the variables and then dragging them into as shown in the figure:

30

Specifying a simple line chart for repeated-measures data Once you have dragged the four variables onto the y-axis drop zones a new dialog box appears. This box tells us that SPSS is creating two temporary variables. One is called Summary, which is going to be the outcome variable (i.e. what we measured in this case the number of hiccups per minute. The other is called index and this variable will represent out independent variable (i.e. what we manipulated in this case the type of intervention). Why does SPSS call them index and summary? Its just because it doesnt

31

know what your particular variables represent so these are just temporary names that we should change! Just click on to get rid of this dialog box.

The Create Summary Group dialog box

Setting Element Properties for a repeated-measures graph We need to edit some of the properties of the graph. The figure shows the options that need to be set: if you cant see this dialog box then click on in the Chart Builder. In

32

the left panel of the figure just note that I have selected to display error bars (see the previous two sections for more information). The middle panel is accessed by clicking on XAxis1 (Line1) in the list labelled Edit Properties of and this allows us to edit properties of the horizontal axis. The first thing we need to do is give the axis a title and I have typed Intervention in the space labelled Axis Label. This label will appear on the graph. Also, we can change the order of our variables if we want to by selecting a variable in the list labelled Order and moving it up down using and . If we change our mind about

displaying one of our variables then we can also remove it from the list by selecting it and clicking on . Click on for these changes to take effect. The right panel is accessed

by clicking on Y-Axis1 (Line1) in the list labelled Edit Properties of and it allows us to edit properties of the vertical axis. The main change that I have made here is to give the axis a label so that the final graph has a useful description on the axis (by default it will just say Mean, which isnt very helpful I have typed Mean Number of Hiccups Per Minute in the box labelled Axis Label. Also note that you can use this dialog box to set the scale of the vertical axis (the minimum value, maximum value and the major increment, which is how often a mark is made on the axis). Mostly you can let SPSS construct the scale automatically and it will be fairly sensible and even if its not you can edit it later. Click on to apply the changes.

33

Completed Chart Builder for a repeated-measures graph Click on to produce the graph. The resulting line chart displays the mean (and the

confidence interval of those means) number of hiccups at baseline and after the three interventions. Note that the axis labels that we typed in have appeared on the graph. We can conclude that the amount of hiccups after tongue pulling was about the same as at baseline; however, carotid artery massage reduced hiccups, but not by as much as a good old-fashioned digital rectal massage. The moral here is: if you have hiccups, find something digital and go amuse yourself for a few minutes.

34

Line chart of the mean number of hiccups at baseline and after various interventions

Multiple Line Charts for Related Means


Just like bar charts, these, to the best of my knowledge, cant be done. I could be wrong though I often am.

Multiple Line Charts for Mixed Designs


The Chart Builder might not be able to do charts for multiple repeated-measures variables, but it can graph what is known as a mixed design. This is a design in which you have one or more independent variable measured using different groups, and one or more independent variables measured using the same sample. Basically, the Chart Builder can produce a graph provided you have only one variable that was a repeated-measure.

35

We all like to text-message (especially students in my lectures who feel the need to text message the person next to them to say Bloody hell, this guy is so boring I need to poke out my own eyes). What will happen to the children, though? Not only will they develop super-sized thumbs, but they might not learn correct written English. Imagine we conducted an experiment in which a group of 25 children was encouraged to send text messages on their mobile phones over a six-month period. A second group of 25 was forbidden from sending text messages for the same period. To ensure that kids in this latter group didnt use their phones, this group were given armbands that administered painful shocks in the presence of microwaves (like those emitted from phones). The outcome was a score on a grammatical test (as a percentage) that was measured both before and after the intervention. The first independent variable was, therefore, text message use (text messagers versus controls) and the second independent variable was the time at which grammatical ability was assessed (baseline or after six months). The data are in the file Text Messages.sav. To graph these data we need to follow the general procedure for graphing related means. Our repeated-measures variable is time (whether grammar ability was measured at baseline or six-months) and is represented in the data file by two columns, one for the baseline data and the other for the follow-up data. In the Chart Builder you need to select these two variables simultaneously by clicking on one and then holding down the Ctrl key on the keyboard and clicking on the other. When they are both highlighted, click on either one and drag it into . The second variable (whether children text messaged or

not) was measured using different children and so is represented in the data file by a grouping variable (group). This variable can be selected in the variable list and dragged into . The two groups will now be displayed as different coloured lines.

36

As I said, the procedure for producing line graphs is basically the same as for bar charts except that you get lines on your graphs instead of bars. Therefore, you should be able to follow the previous sections for bar charts but selecting a simple line chart instead of a simple bar chart, and selecting a multiple line chart instead of a clustered bar chart. I would like you to produce line charts of each of the bar charts in the previous section. In case you get stuck, the self-test answers that can be downloaded from the companion website will take you through it step by step.

The finished Chart Builder is below. Click on

to produce the graph.

37

Selecting the repeated-measures variable in the Chart Builder

38

Completed dialog box for an error bar graph of a mixed design

Error bar graph of the mean grammar score over six months in children who were allowed to text message versus those who were forbidden

39

The resulting line chart shows that at baseline (before the intervention) the grammar scores were comparable in our two groups; however, after the intervention, the grammar scores were lower in the text messagers than in the controls. Also, if you compare the blue line with the green line you can see that text messagers grammar scores have fallen over the six months, whereas the controls grammar scores are fairly stable over time. We could, therefore, conclude that text messaging has a detrimental effect on childrens understanding of English grammar and, therefore, civilization will crumble, and Abaddon will rise cackling from his bottomless pit to claim our wretched souls. Maybe.

Based on my minimal (and no doubt unhelpful) summary, produced a 3-D scatterplot of the data in Figure 4.37 but with the data split by gender. To make things a bit more tricky see if you can get SPSS to display different symbols for the two groups rather than two colours (see SPSS Tip 4.3). A full guided answer can be downloaded from the companion website.
To produce this graph, first double-click on the grouped 3-D scatterplot icon in the Chart Builder (see the book for how to access the Chart Builder). The graph preview on the canvas is the same as for a simple 3-D scatterplot except that our old friend the drop zone is back. First, we simply repeat what we have done for previous scatterplots; so, select Exam Performance (%) from the variable list and drag it into the zone, select Exam Anxiety and drag it into Spent Revising in the variable list and drag it into drop

drop zone, and select Time drop zone. To split the data

cloud by a categorical variable (in this case gender), we select this variable in the variable list and drag it into the drop zone. The completed dialog box is below.

40

Completed dialog box for a grouped 3-D scatterplot However, I also asked you to display the different groups as different-shaped symbols. As it stands, we have asked SPSS to produce different-coloured symbols for males and females. To change this, we need to double-click on the drop

zone to open a new dialog box that has a drop-down list in which Color will currently be selected. Click on this list to activate it and then select Pattern. Then click on to register this change. Back in the Chart Builder the to plot the graph, which

drop zone will have been renamed Set pattern. Click on should look something like the one below.

41

Doing a simple dot plot in the Chart Builder is quite similar to drawing a histogram. Reload the DownloadFestival.sav data and see if you can produce a simple dot plot of the Download day 1 hygiene scores. Compare the resulting graph with the histogram of the same data.
First, make sure that you have loaded the DownloadFestival.sav file and that you open the Chart Builder from this data file. Once you have accessed the Chart Builder (see the book chapter) select the Scatter/Dot in the chart gallery and then double-click on the icon for a simple dot plot (again, see the book chapter if youre unsure of what icon to click). The Chart Builder dialog box will now show a preview of the graph in the canvas area. At the moment its not very exciting because we havent told SPSS which variables we want to plot. Note that the variables in the data editor are listed on the left-hand side of the Chart

42

Builder, and any of these variables can be dragged into any of the spaces surrounded by blue dotted lines (called drop zones). Like a histogram, a simple dot plot plots a single variable (x-axis) against the frequency of scores (y-axis), so there is just one drop zone ( variable from the list and drag it into ). All we need to do is select a . To do a simple dot plot of the day 1 as

hygiene scores we click on this variable in the variable list and drag it to

shown below; you will now find the dot plot previewed on the canvas. To draw the dot plot click on

Click on the Hygiene Day 1 variable and drag it to this Drop Zone.

43

Defining a simple dot plot (a.k.a. density plot) in the Chart Builder

The resulting density plot is shown below along with the original histogram from the book. The first thing that should leap out at you is that they are very similar (in terms of what they show): they both tell us about the distribution of scores, and they both show us the outlier that was discussed in the chapter. These graphs, therefore, are really just two ways of showing the same thing. The density plot gives us a little more detail than the histogram but essentially they show the same thing.

44

Density plot of the Download day 1 hygiene scores and the original histogram from the book

45

Doing a drop-line plot in the Chart Builder is quite similar to drawing a clustered bar chart. Reload the ChickFlick.sav data and see if you can produce a drop-line plot of the arousal scores. Compare the resulting graph with the earlier clustered bar chart of the same data.
To do a drop-line chart for means that are independent (i.e. have come from different groups) we need to double-click on the drop-line chart icon in the Chart Builder (see the book chapter if youre not sure what this icon looks like or how to access the Chart Builder). On the canvas you will see a graph with some dots and three drop zones that are the same as for a clustered bar chart: , and . As with the

clustered bar chart example from the book, select arousal from the variable list and drag it into variable list and drag it into gender variable and drag it into the , select film from the , and select the drop zone. This will mean that the dots

representing males and females will be displayed in different colours, but if you want them displayed as different symbols then, to make this change, double-click in the drop zone to bring up a new dialog box. Within this dialog box there is a drop-down list labelled Distinguish Groups by and in this list you can select Color or Pattern. To change the default, select Pattern and then click on

to make the change. Obviously you can

switch back to displaying different groups in different colours in the same way. The completed Chart Builder is shown below; click on

to produce the graph.

46

47

Using the Chart Builder to plot a drop-line graph

The resulting drop-line graph is shown below together with the clustered bar chart from the book. Hopefully its clear that these graphs show the same information (although notice that the y-axis has been scaled differently by SPSS so that the differences between films look bigger on the drop-line graph than on the bar chart). In both graphs we can see that arousal was overall higher for Memento than Bridget Jones Diary, that men and women differed very little in their arousal during memento, and that men were more aroused during Bridget Jones Diary. The fact that arousal in males and females differed more for Bridget Jones Diary than Memento is possibly a little clearer in the drop-line graph than the bar chart, but its really down to preference.

48

Drop-line graph of mean arousal scores during two films for men and women and the original clustered bar chart from the book

49

Now see if you can produce a drop-line plot of the Text Messages.sav data from earlier in this chapter. Compare the resulting graph with the earlier clustered bar chart of the same data.
To do a drop-line graph of these data we need to follow the general procedure for graphing related means. First, in the Chart Builder you need to double-click on the icon for a dropline graph (see the book chapter for help with this if you need it). Our repeated-measures variable is time (whether grammar ability was measured at baseline or six-months) and is represented in the data file by two columns, one for the baseline data and the other for the follow-up data. Then select these two variables simultaneously by clicking on one and then holding down the Ctrl key on the keyboard and clicking on the other. When they are both highlighted, click on either one and drag it into as shown below. The second

variable (whether children text messaged or not) was measured using different children and so is represented in the data file by a grouping variable (group). This variable can be selected in the variable list and dragged into displayed as differentcoloured dots. The finished Chart Builder is shown below. Click on to produce the graph. . The two groups will now be

The resulting drop-line graph is shown together with the bar chart from the book chapter. They both show that at baseline (before the intervention) the grammar scores were comparable in our two groups. On the drop-line graph this is particularly apparent because the two dots merge into one (you cant see the drop line because the means are so similar). After the intervention, the grammar scores were lower in the text messagers than in the controls. By comparing the two vertical lines its clearer on the drop-line graph that the difference between text messagers and controls is bigger at six months than it is preintervention.

50

Selected repeated-measures variable in the Chart Builder

51

Completed dialog box for an error bar graph of a mixed design

Error bar graph of the mean grammar score over six months in children who were allowed to text message versus those who were forbidden

52

Additional Material

Oliver Twisted: Please Sir, Can I Have Some More Graphs?

As an exercise to get you using some of the graph editing facilities were going to take one of the graphs from the chapter and change some of its properties to produce a graph that follows Tuftes guidelines (i.e. minimal ink, no chartjunk and so on). Well use the graph for the arousal scores for the two films (Bridget Jones Diary and Memento). The original graph looked like this:

53

To edit this graph double-click on it in the SPSS Viewer. This will open the chart in the SPSS Chart Editor:

Double-click anywhere on the graph to open it in the SPSS Chart Editor.

54

Editing the Chart background and border


First, lets get rid of the outside border and background colour of the graph after all, its just unnecessary ink! Select the border by double-clicking on it with the mouse. It will become highlighted in blue and a properties dialog box will appear:

If you select the Fill & Border tab (as above), You can change the background and border of the chart. At the bottom there are options to change the border style, such as how thick it is (Weight), the style (full, dotted etc.) and whether the lines end round, square or butted. We can change the background and border colour by selecting a colour from the palette. To change the background colour click on and then click on any colour from the palette.

55

The square next to the word Fill will change colour. To change the border colour click on and then select a colour from the palette. Again, the square will change from black to this new colour. In this case I want us to get rid of the border and to make the background plain. Therefore, for both I want you to select no colour and this is represented by click on disappear. . So, click on , then click on , then click on , and then

. To apply these changes click on

; the border and background should

Editing the Axes


Now, lets get rid of the axis lines theyre just unnecessary ink too! Select the y-axis by double-clicking on it with the mouse. It will become highlighted in blue and a properties dialog box will appear mush the same as before. This properties dialog box has many tabs that allow us to change aspects of the y-axis. Well look at some of these in turn.

56

The Scale tab. This tab allows us to change the minimum, maximum and increments on the scale. Currently our graph is scaled from 0 to 40 and has a tick every 10 units (the major increment is, therefore, 10). However, there is a lot of space at the top of the graph. First switch off all of the autos and then change the Maximum from 40 to 35. In doing so we cannot have major increments of 10 (because 10 does not divide into 35 without a remainder). So, we need to change the Major Increment to a value that does divide into 35. Lets use 5. To make sure that SPSS doesnt rescale the minimum (it will do), also deselect auto for Minimum and make sure the value is set as 0. Click on should change in the Chart Editor. and the scale of the y-axis

The Number Format tab allows us to change the number format used on the y-axis. The default is to have 2 decimal places, but because all of our ticks appear at values of whole numbers (0, 5, 10, 15, etc.) these decimal places are redundant. If we change the Decimal Places to 0 (see left) then we can get rid of these superfluous decimal places. Click on and the decimal places on the y-axis should vanish

in the Chart Editor.

57

The Lines tab allows us to change the properties of the axis itself. We dont really need to have a line there at all, so lets get rid of it in the same way as we did for the background border. Click on and then click on . Click on

and the y-axis line should vanish in the Chart Editor.

The Labels & Ticks tab allows us to change various aspects of the ticks on the axis. The major increment ticks are shown by default (you should leave them there), and labels for them (the numbers) are shown by default also. These numbers are important, so leave the defaults alone. You could choose to display minor ticks. Lets do this. Ask it to display the minor ticks. We have major ticks every 5, so it might be useful to have a minor tick every 1. To do this we need to set Number of minor ticks per major tick to be 4 (see left).

Lets now edit the x-axis. To do this double-click on it in the Chart Editor. The axis will become highlighted in blue and the Properties dialog box will open. Some of the properties tabs are the same as for the x-axis so well just look at the ones that differ. Using what you have learnt already, set the line colour to be transparent so this it disappears (see the Lines tab above).

58

The Categories tab allows us to change the order of categories on this axis.

59

The Variables tab allows us to change properties of the variables. For one thing if you dont want a bar chart then there is a

drop-down list of alternatives from which you can choose. Also we have gender displayed by different colours, but we can change this so that genders are differentiated by other style differences (such as a pattern). See the drop-down list (left).

Editing the Bars


To edit the bars double-click on any of the bars to select them. They will become highlighted with a blue line. Lets first change the colour of the blue bars. To do this we first need to click once on the blue bars. Now instead of all of the bars being highlighted in blue, only the blue bars will be (see below). We can then use the Properties dialog box to change features of these bars.

60

61

The Depth & Angle tab allows us to change whether the bars have a drop shadow or a 3-D effect. As I tried to stress in the book, you shouldnt add this kind of chartjunk so you should leave your bars as flat. However, in case you want to ignore my advice, this is how you add chartjunk!

The Bar Options tab allows us to change the width of the bars (the default is to have bars within a cluster touching, but if you reduce the bar width below 100% then a gap will appear). You can also alter the gap between clusters. The default allows a small gap between clusters (which is sensible) but you can reduce the gap by increasing the value up to 100% (no gap between clusters) or less (a gap between clusters). You can also select whether the bars are displayed as bars (the default) or if you want them to appear as a line (Whiskers) or a

T-bar (T-bar). This kind of graph really looks best if you leave the bars as bars (otherwise the error bars look silly).

62

The Fill & Border tab allows us to change the colour of the bar and the style and colour of the bars border. I want this bar to be black, so select the colour black from the palette and then click on (see left). Click on and the blue bars

should turn black.

Now we will change the colour of the green bars. To do this we first need to click once on the green bars. Now only the blue bars will be highlighted in blue (see below). We can then use the Properties dialog boxes described above to change the colour of these bars. I want you to colour these bars grey.

63

64

Adding Grid lines


You can add grid lines to a graph simply by clicking on in the Chart Editor. If you do

this you will see that, by default, SPSS adds some pretty hideous-looking lines to your graph:

65

First off, we dont really want grid lines on our x-axis (the vertical grid lines), so lets get rid of them. To do this select them so that they are highlighted in blue in the Chart Editor:

66

Then in the Properties dialog box we change the colour of these lines to be transparent (as we have done with the axis lines above):

Click on

and the vertical grid lines should vanish from the Chart Editor.

67

Now lets edit the horizontal grid lines. To do this click on any one of the horizontal grid lines in the Chart Editor so that they become highlighted in blue:

In the Properties dialog box select the Lines tab. You could change the grid lines to be dotted by selecting a dotted line from the Style drop-down list, but dont, leave them as solid:

68

Next, lets make the grid lines a bit thicker by selecting 1.5 from the Weight drop-down list:

Finally, lets change their colour from black to white. Weve used the colour palette a few times now so you should be able to do this without any help (just click on the white square):

69

Click on

and the horizontal grid lines should become dotted, white and thicker.

Speaking of thick, youve probably noticed that you can no longer see them because we changed the colour to white and they are displayed on a white background. Youre probably also thinking that I must be some kind of idiot for telling you to do that. Youre probably right, but bear with me there is method to the madness inside my rotting breadcrumb of a brain.

Changing the Order of Elements of a Graph


Weve got white grid lines and we cant see them. Thats a bit pointless isnt it? However, we would be able to see them if they were in front of the bars. We can make this happen by again selecting the horizontal grid lines so that they are highlighted in blue; then if we click on one of them with the right mouse button a menu appears on which we can select Bring to Front. Select this option and, wow, the grid lines become visible on the bars themselves: pretty cool, I think youll agree.

70

However, we still have a problem in that our error bars can be seen on top of the grey bars but not on top of the black bars. This looks a bit odd; it would be better if we could see them only poking out of the top on both bars. To do this, click on one of the error bars so that they become highlighted in blue. Then if we click on one of them with the right mouse button a menu appears on which we can select Send to Back. Select this option and the error bas move behind the bars (therefore we can only see the top half).

71

Saving a Chart template


Youve done all of this hard work. What if you want to produce a similar-looking graph? Well, you can save these settings as a template. A template is just a file that contains a set of instructions telling SPSS how to format a graph (e.g. you want grid lines, you want the axes to be transparent, you want the bars to be coloured black and grey, and so on). To do this, in the Chart Editor go to the File menu and select Save Chart Template. You will get a dialog box, and you should select what parts of the formatting you want to save (and add a description also). Although it is tempting to just click on save all, this isnt wise because, for example, when we rescaled the y-axis we asked for a range of 035, and this is unlikely to be a sensible range for other graphs, so this is one aspect of the formatting that we would not want to save.

72

Click on

and then type a name for your template (Ive chosen Tufte Bar.sgt). By

default SPSS saves the templates in a folder called Looks, but you can save it elsewhere if you like. Assuming you have saved a chart template, you can apply it when you run a new graph in the Chart Editor by opening the Options dialog box, clicking on browsing your computer for your template file: and then

73

Chapter 5

Self-Test Answers

Using what you learnt before, plot histograms for the hygiene scores for the three days of the Download festival.
First, access the Chart Builder as in Chapter 4 of the book and then select Histogram in the list labelled Choose from to bring up the gallery, which has four icons representing different types of histogram. We want to do a simple histogram so double-click on the icon for a

74

simple histogram. The Chart Builder dialog box will now show a preview of the graph in the canvas area. To plot the histogram of the day 1 hygiene scores select the Hygiene day 1 variable from the list and drag it into the drop zone. You will now find the

histogram previewed on the canvas (see below). To draw the histogram click on

Click on the Hygiene day 1 variable and drag it to this drop zone.

The resulting histogram is shown and explained in the book.

75

To plot the day 2 scores go back to the Chart Builder but this time select the Hygiene day 2 variable from the list and drag it into the drop zone and click on

To plot the day 3 scores go back to the Chart Builder but this time select the Hygiene day 3 variable from the list and drag it into the drop zone and click on

76

Calculate and interpret the z-scores for skewness of the other variables (computer literacy and percentage of lectures attended).

For computer literacy, the z-score of skewness is 0.174/0.241 = 0.72, which is non-significant, p < .05, because it lies between 1.96 and 1.96. For lectures attended, the z-score of skewness is 0.422/0.241 = 1.75, which is non-significant, p < .05, because it lies between 1.96 and 1.96.

Calculate and interpret the z-scores for kurtosis of all of the variables.
For SPSS exam scores, the z-score of kurtosis is 1.105/0.478 = 2.31, which is significant, p < .05, because it lies outside 1.96 and 1.96. For computer literacy, the z-score of kurtosis is 0.364/0.478 = 0.76, which is non-significant, p < .05, because it lies between 1.96 and 1.96. For lectures attended, the z-score of kurtosis is 0.179/0.478 = 0.37, which is non-significant, p < .05, because it lies between 1.96 and 1.96. For numeracy, the z-score of kurtosis is 0.946/0.478 = 1.98, which is significant, p < .05, because it lies outside 1.96 and 1.96.

Repeat these analyses for the computer literacy and percentage of lectures attended and interpret the results.

77

The SPSS output is split into two sections: first, the results for students at Duncetown University, then the results for those attending Sussex

University. From these tables it is clear that Sussex and Duncetown students scored similarly on computer literacy (both means are very similar). Sussex students attended slightly more lectures (63.27%) than their Duncetown counterparts (56.26%). The histograms are also split according to the university attended. All of the distributions look fairly normal. The only exception is the computer literacy scores for the Sussex students. This is a frilly flat distribution apart from a huge peak between 50 and 60%. Its slightly heavy tailed (right at the very ends of the curve the bars come above the line) and very pointy. This suggests positive kurtosis. If you examine the values of kurtosis you will find that there is significant (p < .05) positive kurtosis: 1.38/0.662 = 2.08, which falls of 1.96 and 1.96. outside

Duncetown University

Sussex University

78

Computer Literacy

Percentage of Lectures Attended

79

Use the explore command to see what effect a natural log transformation would have on the four variables measured in SPSSExam.sav.
The completed dialog boxes should look like this:

The SPSS output below shows Levenes test on the log-transformed scores. Compare this table to the one in the book (which was conducted on the untransformed SPSS exam scores and numeracy). To recap the book chapter, for the untransformed scores Levenes test was

80

nonsignificant for the SPSS exam scores (the value in the column labelled Sig. was .111, more than .05) indicating that the variances were not significantly different (i.e. the homogeneity of variance assumption is tenable). However, for the numeracy scores, Levenes test was significant (the value in the column labelled Sig. was .008, less than .05) indicating that the variances were significantly different (i.e. the homogeneity of variance assumption was violated). For the log-transformed scores (below), the problem has been reversed: Levenes test is now significant for the SPSS exam scores (values in the column labelled Sig. are less than .05) but is no longer significant for the numeracy scores (values in the column labelled Sig. are more than .05). This re-iterates my point from the book chapter that transformations are often not a magic solution to problems in the data!

81

Have a go at creating similar variables logday2 and logday3 for the day 2 and day 3 data. Plot histograms of the transformed scores for all three days.
The completed Compute Variable dialog boxes for day2 and day 3 should look as below:

82

The histograms for days 1 and 2 are in the book, but for day 3 the histogram should look like this:

Repeat this process for day2 and day3 to create variables called
sqrtday2 and sqrtday3. Plot histograms of the transformed scores

for all three days.


The completed Compute Variable dialog boxes for day2 and day 3 should look as below:

83

84

The histograms for days 1 and 2 are in the book, but for day 3 the histogram should look like this:

Repeat this process for day2 and day3. Plot histograms of the transformed scores for all three days.
The completed Compute Variable dialog boxes for day2 and day 3 should look as below:

85

86

The histograms for days 1 and 2 are in the book, but for day 3 the histogram should look like this:

Additional Material
Oliver Twisted: Please Sir, Can I Have Some More Frequencies?

In your SPSS output you will also see tabulated frequency distributions of each variable. These tables are reproduced in the additional online material along with a description.

87

In your SPSS output you will also see tabulated frequency distributions of each variable (below). These tables list each score and the number of times that it is found within the data set. In addition, each frequency value is expressed as a percentage of the sample. Also, the cumulative percentage is given, which tells us how many cases (as a percentage) fell below a certain score. So, for example, we can see that only 15.4% of hygiene scores were below 1 on the first day of the festival. Compare this to the table for day 2: 63.3% of scores were less than 1!

Hygiene (Day 1 of Download Festival) Valid Frequency Percent Valid 0.02 0.05 0.11 0.23 0.26 0.29 0.3 0.32 0.35 1 1 2 2 1 1 1 4 1 .1 .1 .2 .2 .1 .1 .1 .5 .1 Percent .1 .1 .2 .2 .1 .1 .1 .5 .1 Cumulative Percent .1 .2 .5 .7 .9 1.0 1.1 1.6 1.7

88

0.38 0.43 0.44 0.45 0.47 0.5 0.51 0.52 0.55 0.58 0.59 0.6 0.61 0.62 0.64 0.67 0.7 0.73

3 1 1 2 3 3 1 5 4 3 1 1 5 1 3 6 3 6

.4 .1 .1 .2 .4 .4 .1 .6 .5 .4 .1 .1 .6 .1 .4 .7 .4 .7

.4 .1 .1 .2 .4 .4 .1 .6 .5 .4 .1 .1 .6 .1 .4 .7 .4 .7

2.1 2.2 2.3 2.6 3.0 3.3 3.5 4.1 4.6 4.9 5.1 5.2 5.8 5.9 6.3 7.0 7.4 8.1

89

0.76 0.78 0.79 0.81 0.82 0.83 0.84 0.85 0.88 0.9 0.91 0.93 0.94 0.96 0.97 1 1.02 1.03

3 1 1 1 6 1 2 5 6 2 2 1 6 2 7 13 6 1

.4 .1 .1 .1 .7 .1 .2 .6 .7 .2 .2 .1 .7 .2 .9 1.6 .7 .1

.4 .1 .1 .1 .7 .1 .2 .6 .7 .2 .2 .1 .7 .2 .9 1.6 .7 .1

8.5 8.6 8.8 8.9 9.6 9.8 10.0 10.6 11.4 11.6 11.9 12.0 12.7 13.0 13.8 15.4 16.2 16.3

90

1.05 1.06 1.08 1.11 1.14 1.15 1.17 1.2 1.21 1.23 1.24 1.26 1.28 1.29 1.31 1.32 1.33 1.34

5 5 7 5 12 1 5 5 1 9 1 6 2 6 1 8 1 3

.6 .6 .9 .6 1.5 .1 .6 .6 .1 1.1 .1 .7 .2 .7 .1 1.0 .1 .4

.6 .6 .9 .6 1.5 .1 .6 .6 .1 1.1 .1 .7 .2 .7 .1 1.0 .1 .4

16.9 17.5 18.4 19.0 20.5 20.6 21.2 21.9 22.0 23.1 23.2 24.0 24.2 24.9 25.1 26.0 26.2 26.5

91

1.35 1.38 1.41 1.42 1.44 1.45 1.47 1.48 1.5 1.51 1.52 1.53 1.54 1.55 1.56 1.57 1.58 1.59

9 7 11 4 11 1 17 1 14 2 11 1 1 7 2 2 15 1

1.1 .9 1.4 .5 1.4 .1 2.1 .1 1.7 .2 1.4 .1 .1 .9 .2 .2 1.9 .1

1.1 .9 1.4 .5 1.4 .1 2.1 .1 1.7 .2 1.4 .1 .1 .9 .2 .2 1.9 .1

27.7 28.5 29.9 30.4 31.7 31.9 34.0 34.1 35.8 36.0 37.4 37.5 37.7 38.5 38.8 39.0 40.9 41.0

92

1.6 1.61 1.64 1.66 1.67 1.68 1.69 1.7 1.71 1.73 1.75 1.76 1.77 1.78 1.79 1.81 1.82 1.84

3 7 13 4 11 1 1 7 1 11 1 5 1 2 10 2 12 2

.4 .9 1.6 .5 1.4 .1 .1 .9 .1 1.4 .1 .6 .1 .2 1.2 .2 1.5 .2

.4 .9 1.6 .5 1.4 .1 .1 .9 .1 1.4 .1 .6 .1 .2 1.2 .2 1.5 .2

41.4 42.2 43.8 44.3 45.7 45.8 45.9 46.8 46.9 48.3 48.4 49.0 49.1 49.4 50.6 50.9 52.3 52.6

93

1.85 1.87 1.88 1.9 1.91 1.93 1.94 1.96 1.97 2 2.02 2.03 2.05 2.06 2.08 2.09 2.11 2.12

14 2 5 3 11 4 14 1 6 19 16 1 14 1 10 2 4 2

1.7 .2 .6 .4 1.4 .5 1.7 .1 .7 2.3 2.0 .1 1.7 .1 1.2 .2 .5 .2

1.7 .2 .6 .4 1.4 .5 1.7 .1 .7 2.3 2.0 .1 1.7 .1 1.2 .2 .5 .2

54.3 54.6 55.2 55.6 56.9 57.4 59.1 59.3 60.0 62.3 64.3 64.4 66.2 66.3 67.5 67.8 68.3 68.5

94

2.14 2.16 2.17 2.18 2.2 2.21 2.22 2.23 2.24 2.26 2.27 2.28 2.29 2.3 2.31 2.32 2.33 2.34

8 1 15 3 10 2 1 14 1 6 1 1 12 3 1 10 1 1

1.0 .1 1.9 .4 1.2 .2 .1 1.7 .1 .7 .1 .1 1.5 .4 .1 1.2 .1 .1

1.0 .1 1.9 .4 1.2 .2 .1 1.7 .1 .7 .1 .1 1.5 .4 .1 1.2 .1 .1

69.5 69.6 71.5 71.9 73.1 73.3 73.5 75.2 75.3 76.0 76.2 76.3 77.8 78.1 78.3 79.5 79.6 79.8

95

2.35 2.36 2.38 2.39 2.41 2.42 2.44 2.45 2.46 2.47 2.48 2.5 2.51 2.52 2.53 2.55 2.56 2.57

5 2 2 2 3 1 8 2 2 4 1 10 2 9 1 4 2 2

.6 .2 .2 .2 .4 .1 1.0 .2 .2 .5 .1 1.2 .2 1.1 .1 .5 .2 .2

.6 .2 .2 .2 .4 .1 1.0 .2 .2 .5 .1 1.2 .2 1.1 .1 .5 .2 .2

80.4 80.6 80.9 81.1 81.5 81.6 82.6 82.8 83.1 83.6 83.7 84.9 85.2 86.3 86.4 86.9 87.2 87.4

96

2.58 2.61 2.62 2.63 2.64 2.66 2.67 2.7 2.71 2.73 2.75 2.76 2.78 2.79 2.81 2.82 2.84 2.85

6 3 1 3 5 1 5 4 1 5 1 4 1 2 4 2 2 1

.7 .4 .1 .4 .6 .1 .6 .5 .1 .6 .1 .5 .1 .2 .5 .2 .2 .1

.7 .4 .1 .4 .6 .1 .6 .5 .1 .6 .1 .5 .1 .2 .5 .2 .2 .1

88.1 88.5 88.6 89.0 89.6 89.8 90.4 90.9 91.0 91.6 91.7 92.2 92.3 92.6 93.1 93.3 93.6 93.7

97

2.87 2.88 2.9 2.91 2.92 2.94 2.97 3 3.02 3.03 3.08 3.09 3.11 3.12 3.14 3.15 3.17 3.2

1 8 1 3 1 3 3 2 2 1 1 1 1 1 1 2 1 2

.1 1.0 .1 .4 .1 .4 .4 .2 .2 .1 .1 .1 .1 .1 .1 .2 .1 .2

.1 1.0 .1 .4 .1 .4 .4 .2 .2 .1 .1 .1 .1 .1 .1 .2 .1 .2

93.8 94.8 94.9 95.3 95.4 95.8 96.2 96.4 96.7 96.8 96.9 97.0 97.2 97.3 97.4 97.7 97.8 98.0

98

3.21 3.23 3.26 3.29 3.32 3.38 3.41 3.44 3.58 3.69 Total

3 1 1 2 3 2 1 1 1 1 810

.4 .1 .1 .2 .4 .2 .1 .1 .1 .1 100.0

.4 .1 .1 .2 .4 .2 .1 .1 .1 .1 100.0

98.4 98.5 98.6 98.9 99.3 99.5 99.6 99.8 99.9 100.0

Hygiene (Day 2 of Download Festival) Valid Frequency Percent Valid 0 0.02 1 2 .1 .2 Percent .4 .8 Cumulative Percent .4 1.1

99

0.05 0.06 0.08 0.11 0.14 0.17 0.2 0.23 0.26 0.28 0.29 0.32 0.35 0.38 0.41 0.44 0.45 0.47

1 1 2 3 8 6 8 10 5 1 2 5 4 6 4 5 1 4

.1 .1 .2 .4 1.0 .7 1.0 1.2 .6 .1 .2 .6 .5 .7 .5 .6 .1 .5

.4 .4 .8 1.1 3.0 2.3 3.0 3.8 1.9 .4 .8 1.9 1.5 2.3 1.5 1.9 .4 1.5

1.5 1.9 2.7 3.8 6.8 9.1 12.1 15.9 17.8 18.2 18.9 20.8 22.3 24.6 26.1 28.0 28.4 29.9

100

0.48 0.5 0.52 0.55 0.56 0.58 0.64 0.67 0.7 0.73 0.76 0.78 0.79 0.82 0.84 0.85 0.88 0.9

1 2 5 5 1 7 4 3 7 2 9 1 7 4 2 8 1 1

.1 .2 .6 .6 .1 .9 .5 .4 .9 .2 1.1 .1 .9 .5 .2 1.0 .1 .1

.4 .8 1.9 1.9 .4 2.7 1.5 1.1 2.7 .8 3.4 .4 2.7 1.5 .8 3.0 .4 .4

30.3 31.1 33.0 34.8 35.2 37.9 39.4 40.5 43.2 43.9 47.3 47.7 50.4 51.9 52.7 55.7 56.1 56.4

101

0.91 0.94 0.97 1 1.02 1.05 1.06 1.08 1.11 1.13 1.14 1.17 1.18 1.2 1.21 1.23 1.29 1.32

6 6 1 5 4 1 1 2 5 1 5 3 1 2 1 1 1 2

.7 .7 .1 .6 .5 .1 .1 .2 .6 .1 .6 .4 .1 .2 .1 .1 .1 .2

2.3 2.3 .4 1.9 1.5 .4 .4 .8 1.9 .4 1.9 1.1 .4 .8 .4 .4 .4 .8

58.7 61.0 61.4 63.3 64.8 65.2 65.5 66.3 68.2 68.6 70.5 71.6 72.0 72.7 73.1 73.5 73.9 74.6

102

1.35 1.38 1.41 1.44 1.45 1.5 1.52 1.54 1.55 1.58 1.64 1.7 1.73 1.75 1.76 1.78 1.79 1.82

4 3 2 3 1 1 2 1 1 3 2 4 1 1 1 1 1 1

.5 .4 .2 .4 .1 .1 .2 .1 .1 .4 .2 .5 .1 .1 .1 .1 .1 .1

1.5 1.1 .8 1.1 .4 .4 .8 .4 .4 1.1 .8 1.5 .4 .4 .4 .4 .4 .4

76.1 77.3 78.0 79.2 79.5 79.9 80.7 81.1 81.4 82.6 83.3 84.8 85.2 85.6 86.0 86.4 86.7 87.1

103

1.87 1.88 1.9 1.94 1.97 2.05 2.08 2.12 2.2 2.23 2.29 2.32 2.38 2.41 2.42 2.44 2.5 2.53

1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 1

.1 .1 .1 .2 .2 .2 .2 .1 .1 .1 .1 .1 .1 .1 .1 .2 .2 .1

.4 .4 .4 .8 .8 .8 .8 .4 .4 .4 .4 .4 .4 .4 .4 .8 .8 .4

87.5 87.9 88.3 89.0 89.8 90.5 91.3 91.7 92.0 92.4 92.8 93.2 93.6 93.9 94.3 95.1 95.8 96.2

104

2.55 2.61 2.7 2.72 2.85 2.91 3 3.21 3.35 3.44 Total Missing System Total

1 1 1 1 1 1 1 1 1 1 264 546 810

.1 .1 .1 .1 .1 .1 .1 .1 .1 .1 32.6 67.4 100.0

.4 .4 .4 .4 .4 .4 .4 .4 .4 .4 100.0

96.6 97.0 97.3 97.7 98.1 98.5 98.9 99.2 99.6 100.0

Hygiene (Day 3 of Download Festival) Valid Frequency Percent Valid 0.02 2 .2 Percent 1.6 Cumulative Percent 1.6

105

0.08 0.11 0.14 0.17 0.2 0.26 0.29 0.32 0.33 0.35 0.38 0.39 0.41 0.44 0.45 0.47 0.5 0.52

1 1 2 3 3 1 3 1 2 2 5 1 1 6 1 5 1 4

.1 .1 .2 .4 .4 .1 .4 .1 .2 .2 .6 .1 .1 .7 .1 .6 .1 .5

.8 .8 1.6 2.4 2.4 .8 2.4 .8 1.6 1.6 4.1 .8 .8 4.9 .8 4.1 .8 3.3

2.4 3.3 4.9 7.3 9.8 10.6 13.0 13.8 15.4 17.1 21.1 22.0 22.8 27.6 28.5 32.5 33.3 36.6

106

0.53 0.55 0.58 0.61 0.67 0.7 0.72 0.73 0.76 0.81 0.82 0.85 0.88 0.91 0.94 0.96 1.02 1.17

1 3 2 1 1 3 1 1 6 1 1 1 1 5 2 1 4 1

.1 .4 .2 .1 .1 .4 .1 .1 .7 .1 .1 .1 .1 .6 .2 .1 .5 .1

.8 2.4 1.6 .8 .8 2.4 .8 .8 4.9 .8 .8 .8 .8 4.1 1.6 .8 3.3 .8

37.4 39.8 41.5 42.3 43.1 45.5 46.3 47.2 52.0 52.8 53.7 54.5 55.3 59.3 61.0 61.8 65.0 65.9

107

1.18 1.19 1.2 1.26 1.29 1.32 1.38 1.44 1.5 1.55 1.58 1.61 1.66 1.67 1.7 1.73 1.76 1.85

1 1 2 1 1 1 1 1 2 1 2 1 1 3 3 2 2 1

.1 .1 .2 .1 .1 .1 .1 .1 .2 .1 .2 .1 .1 .4 .4 .2 .2 .1

.8 .8 1.6 .8 .8 .8 .8 .8 1.6 .8 1.6 .8 .8 2.4 2.4 1.6 1.6 .8

66.7 67.5 69.1 69.9 70.7 71.5 72.4 73.2 74.8 75.6 77.2 78.0 78.9 81.3 83.7 85.4 87.0 87.8

108

1.88 1.91 2 2.11 2.15 2.29 2.55 2.7 3.02 3.41 Total Missing System Total

2 3 1 2 1 1 1 1 2 1 123 687 810

.2 .4 .1 .2 .1 .1 .1 .1 .2 .1 15.2 84.8 100.0

1.6 2.4 .8 1.6 .8 .8 .8 .8 1.6 .8 100.0

89.4 91.9 92.7 94.3 95.1 95.9 96.7 97.6 99.2 100.0

Oliver Twisted: Please Sir, can I have some more normality tests?

109

The observant among you will see that there is another test reported in the table (the Shapiro Wilk test). The even more eagle-eyed will also notice a footnote to the KS test saying that Lilliefors significance correction has been applied. (You might find this especially confusing if youve ever done the KS test through the nonparametric test menu rather than the explore menu because this correction is not applied.) What the hell is going on? In the additional material for this chapter on the companion website you can find out more about the KS test, some information about the Lilliefors correction and the ShapiroWilk test. What are you waiting for?

If you want to test whether a model is a good fit of your data you can use a goodness-of-fit test (you can read about these in the chapter on categorical data analysis in the book), which has a chi-square test statistic (with the associated distribution). One problem with this test is that it needs a certain sample size to be accurate. The KS test was developed as a test of whether a distribution of scores matches a hypothesized distribution (Massey, 1951). One good thing about the test is that the distribution of the KS test statistic does not depend on the hypothesized distribution (in other words, the hypothesized distribution doesnt have to be a particular distribution). It is also what is known as an exact test, which means that it can be used on small samples. It also appears to have more power to detect deviations from the hypothesized distribution than the chi-square test (Lilliefors, 1967). However, one major limitation of the

KS test is that if location (i.e. the mean) and shape parameters (i.e. the standard deviation) are estimated from the data then the KS test is very conservative, which

110

means it fails to detect deviations from the distribution of interest (i.e. normal). What Lilliefors did was to adjust the critical values for significance for the KS test to make it less conservative (Lilliefors, 1967) using Monte Carlo simulations (these new values were about twothirds the size of the standard values). He also reported that this test was more powerful than a standard chi-square test (and obviously the standard KS test). Another test youll use to test normality is the ShapiroWilk test (Shapiro & Wilk, 1965) which was developed specifically to test whether a distribution is normal (whereas the KS test can be used to test against other distributions than normal). They concluded that their test was comparatively quite sensitive to a wide range of non-normality, even with samples as small as n = 20. It seems to be especially sensitive to asymmetry, long-tailedness and to some degree to short-tailedness. (p. 608). To test the power of these tests they applied them to several samples (n = 20) from various non-normal distributions. In each case they took 500 samples which allowed them to see how many times (in 500) the test correctly identified a deviation from normality (this is the power of the test). They show in these simulations (see table 7 in their paper) that the SW test is considerably more powerful to detect deviations from normality than the KS test. They verified this general conclusion in a much more extensive set of simulations as well (Shapiro, Wilk, & Chen, 1968).

Oliver Twisted: Please Sir, Can I Have Some More Hartleys FMax?

Oliver thinks that my graph of critical values is stupid. Look at that graph, he laughed. Its the most stupid thing Ive ever seen since I was at Sussex Uni and I

111

saw my statistics lecturer, Andy Fie. Well, go choke on your gruel you Dickensian bubo because the full table of critical values is in the additional material for this chapter on the companion website. Critical values for Hartleys test ( = .05).
(n - 1) per group 2 3 4 5 6 7 8 9 10 12 15 20 30 60 Num ber of Variances Com pared 2 39.00 15.40 9.60 7.15 5.82 4.99 4.43 4.03 3.72 3.28 2.86 2.46 2.07 1.67 1.00 3 87.50 27.80 15.50 10.80 8.38 6.94 6.00 5.34 4.85 4.16 3.54 2.95 2.40 1.85 1.00 4 142.00 39.20 20.60 13.70 10.40 8.44 7.18 6.31 5.67 4.79 4.01 3.29 2.61 1.96 1.00 5 202.00 50.70 25.20 16.30 12.10 9.70 8.12 7.11 6.34 5.30 4.37 3.54 2.78 2.04 1.00 6 62.00 29.50 18.70 13.70 10.80 9.03 7.80 6.92 5.72 4.68 3.76 2.91 2.11 1.00 7 72.90 33.60 20.80 15.00 11.80 9.80 8.41 7.42 6.09 4.95 3.94 3.02 2.17 1.00 8 403.00 83.50 37.50 22.90 16.30 12.70 10.50 8.95 7.87 6.42 5.19 4.10 3.12 2.22 1.00 9 475.00 93.90 41.40 24.70 17.50 13.50 11.10 9.45 8.28 6.72 5.40 4.24 3.21 2.26 1.00 10 550.00 104.00 44.60 26.50 18.60 14.30 11.70 9.91 8.66 7.00 5.59 4.37 3.29 2.30 1.00 11 626.00 114.00 48.00 28.20 19.70 15.10 12.20 10.30 9.01 7.25 5.77 4.49 3.36 2.33 1.00 12 704.00 124.00 51.40 29.90 20.70 15.80 12.70 10.70 9.34 7.48 5.93 4.59 3.39 2.36 1.00 266.00 333.00

Chapter 7

Self-Test Answers

How is the t in SPSS Output 7.3 calculated? Use the values in the table to see if you can get the same value as SPSS.
It is calculated using this equation:

t= =

bobserved bexpected SE b bobserved SE b


112

Using the values from SPSS Output 7.3 to calculate t for the constant (t = 134.140/7.537 = 17.79), for the advertising budget, we get: 0.096/0.01 = 9.6. This value is different to the one in the output (t = 9.979) because SPSS rounds values in the output to 3 decimal places, but calculates t using unrounded values (usually this doesnt make too much difference but in this case it does!). In this case the rounding has had quite an effect on the standard error (its value is 0.009632 but it has been rounded to 0.01). To obtain the unrounded values, double-click the table in the SPSS output and then double-click the value that you wish to see in full. You should find that t = 0.096124/0.009632 = 9.979.

How many records would be sold if we spent 666,000 on advertising his

latest CD by black metal band Abgott?


He would sell 198,080 CDs:

Record

Sales

= 134 . 14 + (0 . 096 666 = 198 . 08

= 134 . 14 + (0 . 096 Advertisin

g Budget

Additional Material

Labcoat Leni's Real Research: Why do you like your lecturers?

Chamorro-Premuzic, T., et al. (2008). Personality and Individual Differences, 44, 965976.

In the previous chapter we encountered a study by Chamorro-Premuzic et al. in which they measured students personality characteristics and asked them to rate how much they

113

wanted these same characteristics in their lecturers (see for a full description). In the last chapter we correlated these scores; however, we could go a step further and see whether students personality characteristics predict the characteristics that they would like to see in their lecturers. The data from this study are in the file Chamorro-Premuzic.sav. Labcoat Leni wants you to carry out five multiple regression analyses: the outcome variables in each of the five analyses are the ratings of how much students want to see neuroticism, extroversion, openness to experience, agreeableness and conscientiousness. For each of these outcomes, force Age and Gender into the analysis in the first step of the hierarchy, then in the second block force in the five student personality traits (Neuroticism, Extroversion, Openness to experience, Agreeableness and Conscientiousness). For each analysis create a table of the results. Answers are in the additional material for this website (or look at Table 4 in the original article).

Lecturer Neuroticism
The first regression well do is whether students want lecturers to be neurotic. Define the two blocks as follows. In the first block put Age and Gender:

114

In the second, put all of the student personality variables (five variables in all):

Set the options as in the book chapter. The main output (I havent reproduced it all, but you can find it in the file Charmorro-Premuzic.spv), is as follows:

115

116

You could report these results as: B Step 1 Constant Age Gender 28.22 0.28 2.42 2.59 0.13 1.02 .11* .12* SE B

117

Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness 16.77 0.30 1.90 0.06 0.12 0.17 0.09 0.20 5.30 0.13 1.09 0.06 0.08 0.07 0.07 0.08 .12* .10 .06 .08 .12* .07 .16*

Note:. R2 = .03 for step 1: R2 = .04 for step 2 (p < .05). * p < .05. So basically, age, openness and conscientiousness were significant predictors of wanting a neurotic lecturer (note that for openness and conscientiousness the relationship is negative, i.e. the more a student scored on these characteristics, the less they wanted a neurotic lecturer).

Lecturer Extroversion
The second variable we want to predict is lecturer extroversion. I wont run through the analysis and output but you can find it in the file Charmorro-Premuzic.spv. You could report these results as: B Step 1 Constant 12.13 2.43 SE B

118

Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness

.03 .93

.12 .94

.01 .06

3.62 .02 1.31 .00 .15 .04 .00 .10

4.93 .12 1.00 .06 .07 .07 .07 .08 .01 .08 .01 .14* .03 .00 .10

Note: R2 = .00 for step 1: R2 = .03 for step 2 (p > .05). * p < .05. So basically, student extroversion was the only significant predictor of wanting an extrovert lecturer; the model overall did not explain a significant amount of the variance in wanting an extroverted lecturer.

Lecturer Openness to Experience


The third variable we want to predict is lecturer openness to experience. As before, the SPSS output can be found in the file Charmorro-Premuzic.spv. You could report these results as: B Step 1 SE B

119

Constant Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness

9.41 .04 .23

2.37 .12 .92 .02 .01

5.16 .05 .09 .01 .07 .26 .14 .03

4.75 .12 .96 .05 .07 .07 .06 .07 .02 .01 .01 .05 .20*** .12* .03

Note:. R2 = .00 for step 1 (ns): R2 = .06 for step 2 (p < .001). * p < .05, *** p < .001. So basically, student openness to experience was the most significant predictor of wanting a lecturer who is open to experiences, but student agreeableness predicted this also.

Lecturer Agreeableness
The fourth variable we want to predict is lecturer agreeableness. As before, the SPSS output can be found in the file Charmorro-Premuzic.spv. You could report these results as: B Step 1 SE B

120

Constant Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness

18.30 .47 .83

2.77 .14 1.07 .17 .04

8.76 .47 .78 .14 .05 .22 .14 .14

5.51 .14 1.11 .06 .08 .08 .07 .09 .17** .04 .13* .03 .14** .11 .10

Note:. R2 = .03 for step 1 (p < .01): R2 = .06 for step 2 (p < .001). * p < .05, ** p < .01. Age, student openness to experience and student neuroticism significantly predicted wanting a lecturer who is agreeable. Age and openness to experience had negative relationships (the older and more open to experienced you are, the less you want an agreeable lecturer), whereas as student neuroticism increases so does the desire for an agreeable lecturer (not surprisingly, because neurotics will lack confidence and probably feel more able to ask an agreeable lecturer questions).

Lecturer Conscientiousness
The final variable we want to predict is lecturer conscientiousness. As before, the SPSS output can be found in the file Charmorro-Premuzic.spv.

121

You could report these results as: B Step 1 Constant Age Gender Step 2 Constant Age Gender Neuroticism Extroversion Openness Agreeableness Conscientiousness 5.85 .14 1.65 .01 .06 .01 .12 .16 4.50 .11 .91 .05 .07 .06 .06 .07 .06 .10 .01 .05 .01 .12* .14* 13.84 .16 2.33 2.24 .11 .87 .07 .14** SE B

Note: R2 = .02 for step 1 (p < .05): R2 = .05 for step 2 (p < .01). * p < .05, ** p < .01. Student agreeableness and conscientiousness both predicted wanting a lecturer who is conscientious. Note also that gender predicted this in the first step, but its b became slightly non-significant (p = .07) when the student personality variables were forced in as well. However, gender is probably a variable that should be explored further within this context.

122

Compare your results to Table 4 in the actual article. Ive highlighted the area of the table relating to our analyses (our five analyses are represented by the columns labelled N, E, O, A and C.

123

Oliver Twisted: Please Sir, Can I Have Some More Recode?

Our data set has missing values, worries Oliver. What do we do if we only want to recode cases for which we have data?. Well, we can set some other options at this point, thats what we can do. This is getting a little bit more involved so if you want to know more then the additional material for this chapter on the companion website will tell you. Stop worrying Oliver, everything will be OK.

One of the problems with the Glastonbury data is that we didnt have hygiene scores for all of the people at day 3. Therefore, when we calculated the change scores (day 3 minus day 1) we likewise only have data for a subset of our sample. When we come to recode the music variable, we should probably not recode the cases for which we dont have data for the change variable. This is fairly simple to do by setting an IF command. That is, we want to tell SPSS IF there is a value for the variable change then recode the variable music. To do this, click on to access the dialog box below:

124

By default SPSS will include all of the cases in the data, but we can use this dialog box to set conditions. So, we can tell SPSS recode these cases only if a certain condition is met. The condition that we want to set is that we want to recode only cases for which there is a value for the variable change (i.e. we want to exclude cases for which there are missing values in the variable change). To specify this, first click on to

activate the white box below. Rather like the compute command (see Chapter 5) we can type commands in this box, and select built-in commands from the boxes labelled Function group and Functions and Special Variables. You can see in the diagram that I have selected a command in the category Missing Values called Missing. To be specific, the condition that I have set is

(1 MISSING(change)). MISSING is a builtin command that returns true (i.e. the value 1) for a case that has a system-missing or user-defined missing value for the specified variable; it returns false (i.e. the value 0) if a case has a value. Hence, MISSING(change) returns a value of 1 for cases that have a missing value for the variable change and 0 for

125

cases that do have values. We want to recode the cases that do have a value for the variable change, therefore I have specified 1-MISSING(change). This command reverses MISSING(change) so that it returns 1 (true) for cases that have a value for the variable change and 0 (false) for system- or user-defined missing values. To sum up, the DO IF (1MISSING(change)) tells SPSS Do the following RECODE commands if the case has a value for the variable change.

Try creating the remaining two dummy variables (call them Metaller and Indie_Kid) using the same principles.

Select Variable

to access the recode dialog box. Select the variable Output Variable by clicking on . You then need to name the new variable. Go

you want to recode (in this case music) and transfer it to the box labelled Numeric to the part that says Output Variable and in the box below where it says Name write a name for your second dummy variable (call it Metaller). You can also give this variable a more descriptive name by typing something in the box labelled Label (for this first dummy variable Ive called it No Affiliation vs. Metaller). When youve done this click on transfer this new variable to the box labelled Numeric Variable should now say music Metaller). to Output Variable (this box

126

We need to tell SPSS how to recode the values of the variable music into the values that we want for the new variable, Metaller. To do this click on to access the dialog box below. This dialog box is used to change values of the original variable into different values for the new variable. For this dummy variable, we want anyone who was a metaller to get a code of 1 and everyone else to get a code of 0. Now, metaller was coded with the value 2 in the original variable, so you need to type the value 2 in the section labelled Old Value in the box labelled Value. The new value we want is 1, so we need to type the value 1 in the section labelled New Value in the box labelled Value. When youve done this, click on to add this change to the list of changes. The next thing we need to do is to and type the value 0 in the section labelled New Value in the box to add this change to the list of to return to the main dialog box, and then change the remaining groups to have a value of 0 for the first dummy variable. To do this just select labelled Value. When youve done this, click on changes. When youve done this click on click on

to create the dummy variable. This variable will appear as a new column in

the data editor, and you should notice that it will have a value of 1 for anyone originally classified as a metaller and a value of 0 for everyone else.

To create the final dummy variable select Variable (or click on

to access the Output

recode dialog box. Select music and drag it to the box labelled Numeric Variable

). Go to the part that says Output Variable and in the box below

where it says Name write a name for your final dummy variable (call it Indie_Kid). You can also give this variable a more descriptive name by typing something in the box labelled Label (for this first dummy variable Ive called it No Affiliation vs. Indie Kid). When youve

127

done this click on

to transfer this new variable to the box labelled Numeric Variable Indie_kid).

Output Variable (this box should now say music

We need to tell SPSS how to recode the values of the variable music into the values that we want for the new variable, Indie_Kid. To do this click on to access the dialog box below. For this dummy variable, we want anyone who was an indie kid to get a code of 1 and everyone else to get a code of 0. Now, indie kid was coded with the value 1 in the original variable, so you need to type the value 1 in the section labelled Old Value in the box labelled Value. The new value we want is 1, so we need to type the value 1 in the section labelled New Value in the box labelled Value. When youve done this, click on to add this change to the list of changes. The next thing we need to do is to change the remaining groups to have a value of 0 for the first dummy variable. To do this just select and type the value 0 in the section labelled New Value in the box labelled Value. When youve done this, click on When youve done this click on to add this change to the list of changes. to return to the main dialog box, and then click on

to create the dummy variable. This variable will appear as a new column in the data editor, and you should notice that it will have a value of 1 for anyone originally classified as an indie kid and a value of 0 for everyone else.

128

Use what youve learnt in this chapter to run a multiple regression using the

change scores as the outcome, and the three dummy variables (entered in the same block) as predictors.
Select to access the main dialog box for regression, which you should complete as below. Use the book chapter to determine what other options you want to select. The output and interpretation are in the book chapter.

129

Chapter 8

Self-Test Answers

Calculate the values of Cox and Snells and Nagelkerkes R2 reported by

SPSS. [Hint: These equations use the log-likelihood, whereas SPSS reports 2 log-likelihood. LL(New) is, therefore, 144.16/2 = 72.08, and LL(Baseline) = 154.08/2 = 77.04. The sample size, n, is 113.
Cox and Snells R2 is calculated from this equation:
2 ( LL ( New ) LL ( Baseline )) n

2 CS

= 1e

Remember that this equation uses the log-likelihood, whereas SPSS reports 2 loglikelihood. LL(New) is, therefore, 144.16/2 = 72.08, and LL(Baseline) = 154.08/2 = 77.04. The sample size, n, is 113:
2 113 ( 72.08 ( 77.04))

2 R CS = 1e

= 1 e 0.0878 = 1 0.916 = 0.084


Nagelkerkes adjustment is calculated from:

130

R =
2 N

2 R CS 2( LL ( Baseline ))

n 1e 0.084 = 1 e 1.3635 0.084 = 1 0.2558 = 0.113

Use the case summaries function in SPSS to create a table for the first 15 cases in the file Eel.sav showing the values of Cured, Intervention,
Duration, the predicted probability (PRE_1) and the predicted group

membership (PGR_1) for each case.


The completed dialog box should look like this:

Rerun this analysis using the forced entry method of analysis how do your conclusions differ?
Im not going to run through the whole analysis, but essentially the main bit of the output that Ill look at is the Variables in the Equation table:

131

Essentially, when all variables are entered none of them are significant. It looks like our intervention (which we concluded was successful) was not. Puzzling, eh? Well, actually not. The reason for this is because the Intervention and the Intervention Duration interation are very highly correlated. To prove this fact, I created a variable representing the interaction (this is easy to do: you use the compute command and multiple the two variables in the interaction together)! The table is below. Note the correlation between the intervention and the interaction. It is r = .98. Basically, its an almost perfect correlation. This means that these two variables are essentially the same, so when they are forced into the regression they are fighting over the same variance in the outcome variable. So, theyre both not significant. They are so highly correlated because there isnt a lot of variability in the variable Duration. Try rerunning the analysis now but without the interaction term.

132

If we rerun the analysis without the interaction, we get:

The intervention is no longer fighting over the same variance as the interaction term and so becomes significant again. We basically get the same results (in terms of significance) as we did from the stepwise method used in the chapter. [Incidentally, if you ran the analysis with the interaction term but not Intervention then youd find the interaction term is significant the reason why should be relatively obvious: its because the two variables share so much variance.]

133

We learnt how to do hierarchical regression in the previous chapter. Try to conduct a hierarchical logistic regression analysis on these data. Enter Previous and PSWQ in the first block and Anxious in the second. There is a full guide on how to do the analysis and its interpretation in the additional material on the website. Running the Analysis: Block Entry Regression
To run the analysis, we must first select the main Logistic Regression dialog box, by selecting . In this example, we know of two

previously established predictors and so it is a good idea to enter these predictors into the model in a single block. Then we can add the new predictor in a second block (by doing this we effectively examine an old model and then add a new variable to this model to see whether the model is improved). This method is known as block entry and the figure shows how it is specified. It is easy to do block entry regression. First you should use the mouse to select the variable scored from the variables list and then transfer it to the box labelled Dependent by clicking on . Second, you should select the two previously established predictors. So,

select pswq and previous from the variables list and transfer them to the box labelled Covariates by clicking on second block, click on . Our first block of variables is now specified. To specify the to clear the Covariates box, which should now be labelled

Block 2 of 2. Now select anxious from the variables list and transfer it to the box labelled Covariates by clicking on . We could at this stage select some interactions to be included

in the model, but unless there is a sound theoretical reason for believing that the predictors should interact there is no need. Make sure that Enter is selected as the method of regression (this method is the default and so should be selected already).

134

Once the variables have been specified, you should select the options described in the chapter but because none of the predictors are categorical there is no need to use the option. When you have selected the options and residuals that you want you can return to the main Logistic Regression dialog box and click on .

135

Interpreting Output
The output of the logistic regression will be arranged in terms of the blocks that were specified. In other words, SPSS will produce a regression model for the variables specified in block 1, and then produce a second model that contains the variables from both blocks 1 and 2. First, the output shows the results from block 0: the output tells us that 75 cases have been accepted and that the dependent variable has been coded 0 and 1 (because this variable was coded as 0 and 1 in the data editor, these codings correspond exactly to the data in SPSS). We are then told about the variables that are in and out of the equation. At this point only the constant is included in the model, and so to be perfectly honest none of this information is particularly interesting!

Dependent Variable Encoding Original Value Missed Penalty Scored Penalty Internal Value 0 1

Block 0: Beginning Block


a,b Classification Table

Predicted Result of Penalty Kick Missed Scored Penalty Penalty 0 35 0 40

Step 0

Observed Result of Penalty Kick Overall Percentage

Missed Penalty Scored Penalty

Percentage Correct .0 100.0 53.3

a. Constant is included in the model. b. The cut value is .500

Variables in the Equation

B Step 0 Constant .134

S.E. .231

Wald .333

df 1

Sig. .564

Exp(B) 1.143

Variables not in the Equation Step 0 Variables Overall Statistics PREVIOUS PSWQ Score 34.109 34.193 41.558 df 1 1 2 Sig. .000 .000 .000

136

The results from block 1 are shown next and in this analysis we forced SPSS to enter previous and pswq into the regression model. Therefore, this part of the output provides information about the model after the variables previous and pswq have been added. The first thing to note is that the 2LL is 48.66, which is a change of 54.98 (which is the value given by the model chi-square). This value tells us about the model as a whole whereas the block tells us how the model has improved since the last block. The change in the amount of information explained by the model is significant (p < .0001) and so using previous experience and worry as predictors significantly improves our ability to predict penalty success. A bit further down, the classification table shows us that 84% of cases can be correctly classified using pswq and previous. In the intervention example, Hosmer and Lemeshows goodness-of-fit test was 0. The reason is that this test cant be calculated when there is only one predictor and that predictor is a categorical dichotomy! However, for this example the test can be calculated. The important part of this test is the test statistic itself (7.93) and the significance value (.3388). This statistic tests the hypothesis that the observed data are significantly different from the predicted values from the model. So, in effect, we want a non-significant value for this test (because this would indicate that the model does not differ significantly from the observed data). We have a

non-significant value here, which is indicative of a model that is predicting the real-world data fairly well. The part of the output labelled Variables in the Equation then tells us the parameters of the model when previous and pswq are used as predictors. The significance values of the Wald statistics for each predictor indicate that both pswq and previous significantly predict penalty success (p < .01). The values of the odds ratio (Exp(B)) for previous indicates that if the percentage of previous penalties scored goes up by one, then the odds of scoring a penalty also increase (because the odds ratio is greater than 1). The

137

confidence interval for this value ranges from 1.02 to 1.11 so we can be very confident that the value of the odds ratio in the population lies somewhere between these two values. Whats more, because both values are greater than 1 we can also be confident that the relationship between previous and penalty success found in this sample is true of the whole population of footballers. The odds ratio for pswq indicates that if the level of worry increases by one point along the Penn State worry scale, then the odds of scoring a penalty decrease (because it is less than 1). The confidence interval for this value ranges from .68 to .93 so we can be very confident that the value of the odds ratio in the population lies somewhere between these two values. In addition, because both values are less than 1 we can be confident that the relationship between pswq and penalty success found in this sample is true of the whole population of footballers. If we had found that the confidence interval ranged from less than 1 to more than 1, then this would limit the generalizability of our findings because the odds ratio in the population could indicate either a positive (odds ratio > 1) or negative (odds ratio < 1) relationship. A glance at the classification plot also brings us good news because most cases are clustered at the ends of the plot and few cases lie in the middle of the plot. This reiterates what we know already: that the model is correctly classifying most cases. We can, at this point, also calculate R2 by dividing the model chi-square by the original value of 2LL. The result is:

R2 =

model chi-square original 2LL 54.977 = 103.6385 = 0.53

We can interpret the result as meaning that the model can account for 53% of the variance in penalty success (so, roughly half of what makes a penalty kick successful is still unknown).

138

Block 1: Method = Enter

Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 54.977 54.977 54.977 df 2 2 2 Sig. .000 .000 .000

Model Summary Step 1 -2 Log likelihood 48.662 Cox & Snell R Square .520 Nagelkerke R Square .694

Hosmer and Lemeshow Test Step 1 Chi-square 7.931 df 7 Sig. .339

Contingency Table for Hosmer and Lemeshow Test Result of Penalty Kick = Missed Penalty Observed Expected 8 7.904 8 7.779 8 6.705 4 5.438 2 3.945 2 1.820 2 1.004 1 .298 0 .108 Result of Penalty Kick = Scored Penalty Observed Expected 0 .096 0 .221 0 1.295 4 2.562 6 4.055 6 6.180 6 6.996 7 7.702 11 10.892

Total 8 8 8 8 8 8 8 8 11

Step 1

1 2 3 4 5 6 7 8 9

Classification Tablea Predicted Result of Penalty Kick Missed Scored Penalty Penalty 30 5 7 33

Step 1

Observed Result of Penalty Kick Overall Percentage

Missed Penalty Scored Penalty

Percentage Correct 85.7 82.5 84.0

a. The cut value is .500

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper 1.022 1.114 .679 .929

Step a 1

PREVIOUS PSWQ Constant

B .065 -.230 1.280

S.E. .022 .080 1.670

Wald 8.609 8.309 .588

df 1 1 1

Sig. .003 .004 .443

Exp(B) 1.067 .794 3.598

a. Variable(s) entered on step 1: PREVIOUS, PSWQ.

139

The output for block 2 shows what happens to the model when our new predictor is added (anxious). So, we begin with the model that we had in block 1 and we then add anxious to it. The effect of adding anxious to the model is to reduce the 2LL to 47.416 (a reduction of 1.246 from the model in block 1 as shown in the model chi-square and block statistics). This improvement is non-significant, which tells us that including anxious in the model has not significantly improved our ability to predict whether a penalty will be scored or missed. The classification table tells us that the model is now correctly classifying 85.33% of cases. Remember that in block 1 there were 84% correctly classified and so an extra 1.33% of cases are now classified (not a great deal morein fact, examining the table shows us that only one extra case has now been correctly classified). The table labelled Variables in the Equation now contains all three predictors and something very interesting has happened: pswq is still a significant predictor of penalty success; however, previous experience no longer significantly predicts penalty success. In addition, state anxiety appears not to make a significant contribution to the prediction of penalty success. How can it be that previous experience no longer predicts penalty success,

140

and neither does anxiety, yet the ability of the model to predict penalty success has improved slightly? Block 2: Method = Enter

Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 1.246 1.246 56.223 df 1 1 3 Sig. .264 .264 .000

Model Summary Step 1 -2 Log likelihood 47.416 Cox & Snell R Square .527 Nagelkerke R Square .704

Hosmer and Lemeshow Test Step 1 Chi-square 9.937 df 7 Sig. .192

Contingency Table for Hosmer and Lemeshow Test Result of Penalty Kick = Missed Penalty Observed Expected 8 7.926 8 7.769 9 7.649 4 5.425 1 3.210 4 1.684 1 1.049 0 .222 0 .067 Result of Penalty Kick = Scored Penalty Observed Expected 0 .074 0 .231 0 1.351 4 2.575 7 4.790 4 6.316 7 6.951 8 7.778 10 9.933

Total 8 8 9 8 8 8 8 8 10

Step 1

1 2 3 4 5 6 7 8 9

Classification Tablea Predicted Result of Penalty Kick Missed Scored Penalty Penalty 30 5 6 34

Step 1

Observed Result of Penalty Kick Overall Percentage

Missed Penalty Scored Penalty

Percentage Correct 85.7 85.0 85.3

a. The cut value is .500

141

Variables in the Equation 95.0% C.I.for EXP(B) Lower Upper .950 1.578 .660 .917 .803 2.162

Step a 1

PREVIOUS PSWQ ANXIOUS Constant

B .203 -.251 .276 -11.493

S.E. .129 .084 .253 11.802

Wald 2.454 8.954 1.193 .948

df 1 1 1 1

Sig. .117 .003 .275 .330

Exp(B) 1.225 .778 1.318 .000

a. Variable(s) entered on step 1: ANXIOUS.

The classification plot is similar to before and the contribution of pswq to predicting penalty success is relatively unchanged. What has changed is the contribution of previous experience. If we examine the values of the odds ratio for both previous and anxious it is clear that they both potentially have a positive relationship to penalty success (i.e. as they increase by a unit, the odds of scoring improve). However, the confidence intervals for these values cross 1, which indicates that the direction of this relationship may be unstable in the population as a whole (i.e. the value of the odds ratio in our sample may be quite different to the value if we had data from the entire population).

You may be tempted to use this final model to say that, although worry is a significant predictor of penalty success, the previous finding that experience plays a role is incorrect.

142

This would be a dangerous conclusion to make and if you read the section on multicollinearity in the book youll see why!

Try creating two new variables that are the natural log of Anxious and
Previous.
First of all, the completed dialog box for PSWQ is given below to give you some idea of how this variable is created (following the instructions in the chapter):

For Anxious, create a new variable called LnAnxious by entering this name into the box labelled Target Variable and then click on and give the variable a more descriptive

name such as Ln(anxiety). In the list box labelled Function group, click on Arithmetic and then in the box labelled Functions and Special Variables click on Ln and transfer it to the command area by clicking on . Replace the question mark with the variable Anxious by

143

either selecting the variable in the list and clicking on question mark is. Click on to create the variable.

or just typing Anxious where the

For Previous, create a new variable called Ln Previous by entering this name into the box labelled Target Variable and then click on and give the variable a more

descriptive name such as Ln(previous performance). In the list box labelled Function group, click on Arithmetic and then in the box labelled Functions and Special Variables click on Ln (this is the natural log transformation) and transfer it to the command area by clicking on . Replace the question mark with the variable Previous by either selecting or just typing Previous where the question mark

the variable in the list and clicking on is. Click on to create the variable.

Alternatively, you can create all three variables in one go using the following syntax:

Using what you learned in Chapter 6, carry out a Pearson correlation between all of the variables in this analysis. Can you work out why we have a problem with collinearity?
The results of your analysis should look like this:

144

Correlations Percentage of previous penalties scored .674** .000 75 -.993** .000 75 1.000 . 75 -.644** .000 75

Result of Penalty Kick Result of Penalty Kick Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 1.000 . 75 -.668** .000 75 .674** .000 75 -.675** .000 75

State Anxiety -.668** .000 75 1.000 . 75 -.993** .000 75 .652** .000 75

Penn State Worry Questionnaire -.675** .000 75 .652** .000 75 -.644** .000 75 1.000 . 75

State Anxiety

Percentage of previous penalties scored

Penn State Worry Questionnaire

**. Correlation is significant at the 0.01 level (2-tailed).

From this output we can see that Anxious and Previous are highly negatively correlated (r = 0.99); in fact they are nearly perfectly correlated. Both Previous and Anxious correlate with penalty success1 but because they are correlated so highly with each other, it is unclear which of the two variables predicts penalty success in the regression. As such our multicollinearity stems from the near perfect correlation between Anxious and Previous.

What does the log-likelihood measure?

The log-likelihood statistic is analogous to the residual sum of squares in multiple regression in the sense that it is an indicator of how much unexplained information there is after the model has been fitted. It, therefore, follows that large values of the log-likelihood statistic indicate poorly fitting statistical models, because the larger the value of the loglikelihood, the more unexplained observations there are.

If you think back to Chapter 6, these correlations with penalty success (a dichotomous variable) are pointbiserial correlations.

145

Use what you learnt earlier in this chapter to check the assumptions of multicollinearity and linearity of the logit.

Testing for Linearity of the Logit


In this example we have three continuous variables (Funny, Sex, Good_Mate), therefore we have to check that each one is linearly related to the log of the outcome variable (Success). To test this assumption we need to run the logistic regression but include predictors that are the interaction between each predictor on the log of itself. For each variable create a new variable that is the log of the original variable. For example, for Funny, create a new variable called LnFunny by entering this name into the box labelled Target Variable and then click on and give the variable name such as Ln(Funny).

In the list box labelled Function group, click on Arithmetic and then in the box labelled Functions and Special Variables click on Ln (this is the natural log transformation) and transfer it to the command area by clicking on . When the command is transferred, it

appears in the command area as LN(?) and the question mark should be replaced with a variable name (which can be typed manually or transferred from the variables list). So replace the question mark with the variable Funny by either selecting the variable in the list and clicking on create the variable. , or just typing PSWQ where the question mark is. Click on to

146

Repeat this process for Sex and Good_Mate. Alternatively do all three at once using this syntax: COMPUTE LnFunny=LN(Funny). COMPUTE LnSex=LN(Sex). COMPUTE LnGood_Mate=LN(Good_Mate). EXECUTE.

To test the assumption we need to redo the analysis but putting in our three covariates, and also the interactions of these covariates with their natural logs. So, as with the main example in the chapter we need to specify a custom model:

147

Note that (1) we need to enter the log variables in the first screen so that they are listed in the second dialog box, and (2) in the second dialog box we have only included the main effects of Sex, Funny and Good_Mate and their interactions with their log values.

148

The output above is all that we need to look at because it tells us about whether any of our predictors significantly predict the outcome categories (generally). The assumption of linearity of the logit is tested by the three interaction terms, all of which are significant (p < .05). This means that all three predictors have violated the assumption.

Testing for Multicollinearity


You can obtain statistics such as the tolerance and VIF by simply running a linear regression analysis using the same outcome and predictors as the logistic regression. It is essential that you click on Once you have selected and then select Collinearity diagnostics in the dialog box. , switch off all of the default options, click on to

return you to the Linear Regression dialog box, and then click on

to run the analysis.

149

Menard (1995; see book references) suggests that a tolerance value less than 0.1 almost certainly indicates a serious collinearity problem. Myers (1990; see book references) also suggests that a VIF value greater than 10 is cause for concern and in these data all of the VIFs are well below 10 (and tolerances above 0.1). It seems from these values that there is not an issue of collinearity between the predictor variables. We can investigate this issue further by examining the collinearity diagnostics.

150

The table labelled Collinearity Diagnostics gives the eigenvalues of the scaled, uncentred cross-products matrix, the condition index and the variance proportions for each predictor. If the eigenvalues are fairly similar then the derived model is likely to be unchanged by small changes in the measured variables. The condition indexes are another way of expressing these eigenvalues and represent the square root of the ratio of the largest eigenvalue to the eigenvalue of interest (so, for the dimension with the largest eigenvalue, the condition index will always be 1). For these data the final dimension has a condition index of 15.03, which is nearly twice as large as the previous one. Although there are no hard and fast rules about how much larger a condition index needs to be to indicate collinearity problems, this could indicate a problem. For the variance proportions we are looking for predictors that have high proportions on the same small eigenvalue, because this would indicate that the variances of their regression coefficients are dependent. So we are interested mainly in the bottom few rows of the table (which represent small eigenvalues). In this example, 4057% of the variance in the regression coefficients of both Sex and Good_Mate is associated with eigenvalue number 4 and 3439% with eigenvalue number 5 (the smallest eigenvalue), which indicates some dependency between these variables. So, there is some dependency between Sex and Good_Mate but given the VIF we can probably assume that this dependency is not problematic.

151

Additional Material Diagnostics for the Eel.sav analysis

152

153

Labcoat Lenis Real Research: Mandatory suicide?

Lacourse, E. et al. (2001). Journal of Youth and Adolescence, 30, 321332.

As you might have noticed by now, although I have fairly ecclectic tastes in music, my favourite kind of music is heavy metal. One thing that is mildly irritating about liking heavy music is that everyone assumes that youre a miserable or aggressive bastard. When not listening (and often while listening to) heavy metal, I spend most of my time researching clinical psychology: I research how anxiety develops in children. Therefore, I was literally beside myself with excitement when a few years back I stumbled on a paper that combined these two interests. Lacourse, Claes, and Villeneuve (2001) carried out a study to see whether a love of heavy metal could predict suicide risk. Fabulous stuff! Eric Lacourse and his colleagues used questionnaires to measure several background variables: suicide risk (yes or no), marital status of parents (together or divorced/separated), the extent to which the persons mother and father were neglectful, self-

estrangement/powerlessness (adolescents who have negative self-perceptions, are bored with life, etc.), social isolation (feelings of a lack of support), normlessness (beliefs that socially disapproved behaviours can be used to achieve certain goals), meaninglessness (doubting that school is relevant to gain employment), and drug use. In addition, they measured liking of different categories of music. For heavy metal they included classic bands (Black Sabbath, Iron Maiden), thrash metal bands (Slayer, Metallica), death/black metal bands (Obituary, Burzum)

154

and gothic bands (Marilyn Manson, Sisters of Mercy). As well as liking they measured behavioural manifestations of worshipping these bands (hanging posters, hanging out with other metal fans), and vicarious music listening (whether music was used when angry or to bring out aggressive moods). They carried out a logistic regression predicting suicide risk from all of these predictors for males and females separately. The data for the female sample are in the file Lacourse et al. (2001) Females.sav. Labcoat Leni wants you to carry out a logistic regression predicting Suicide_Risk from all of the other predictors (forced entry). (To make it easier to compare to the published results I suggest you enter the predictors in the same order as Table 3 in the paper: Age, Marital_Status,
Mother_Negligence, Father_Negligence, Self_Estrangement, Isolation, Normlessness, Meaninglessness, Drug_Use, Metal, Worshipping, Vicarious.) Create a table of the results;

does listening to heavy metal make girls suicidal? If not, what does? Answers are in the additional material for this website (or look at Table 3 in the original article).

The main analysis is fairly simple to specify because were just forcing all predictors in at the same time. Therefore, the completed main dialog box should look like this (note that I have ordered the predictors as suggested by Labcoat Leni, and that you wont see all of them in the dialog box because the list is too long!):

155

We also need to specify our categorical variables. We have only 1 Marital_status:

I have chosen an indicator contrast with the first category (Together) as the reference category. It actually doesnt matter whether you select first or last because there are only two categories. However, it will affect the sign of the beta coefficient. I have chosen the first category as the reference category purely because it gives us a positive beta as in Lacourse et al.s table. If you chose last (the default) the resulting coefficient will be the same magnitude but a negative value instead. You can select whatever other options you see fit based on the chapter (the CI for Exp(B) will need to be selected to get the same output as below). The main output is as follows:

156

We can present these results in the following table: 95% CI for Odds Ratio

Odds B SE Lower Ratio Constant Age Marital status Mother negligence Father negligence Self-estrangement/
6.21 0.69* 6.21 0.32 1.06 2.00 3.77

Upper

0.18

0.68

0.32

1.20

4.53

0.02 0.09* 0.15*

0.05 0.05 0.06

0.88 0.99 1.03

0.98 1.09 1.17

1.09 1.20 1.33

157

powerlessness Social isolation Normlessness Meaninglessness Drug use Metal Worshipping Vicarious listening
0.01 0.19* 0.07 0.32** 0.14 0.16* 0.34 0.08 0.11 0.06 0.10 0.09 0.13 0.20 0.86 0.98 0.83 1.12 0.96 0.91 0.48 0.99 1.21 0.94 1.37 1.15 1.17 0.71 1.15 1.50 1.05 1.68 1.37 1.51 1.04

*p < .05, ** p < .01; one-tailed

Ive reported one-tailed significances (because Lacourse et al. do and it makes it easier to compare our results to Table 3 in their paper). We can conclude that listening to heavy metal did not significantly predict suicide risk in women (of course not; anyone Ive ever met who likes metal does not conform to the stereotype). However, in case youre interested, listening to country music apparently does (Stack & Gundlach, 1992). The factors that did predict suicide risk were age (risk increased with age), father negligence (although this was significant only one-tailed, it showed that as negligence increased so did suicide risk), self-estrangement (basically low self-esteem predicted suicide risk, as you might expect), normlessness (again, only 1-tailed), drug use (the more drugs used, the more likely a person was to be in the atrisk category), and worshipping (the more the person showed signs of worshipping bands, the more likely they were to be in the atrisk group). The most significant predictor was drug use.

158

So, this shows you that for girls, listening to metal was not a risk factor for suicide, but drug use was. To find out what happens for boys, youll just have to read the article! This is Scientific proof that metal isnt bad for your health, so download some Deathspell Omega and enjoy!

159

160

Chapter 9

Self-Test Answers

Enter these data into SPSS. Plot an error bar graph of the spider data.

You can check your data entry against the file spiderBG.sav. The completed graph editor window should look like this:

161

Enter these data into SPSS. Plot an error bar graph of the spider data.
You can check your data entry against the file spiderRM.sav. The completed graph editor window should look like this:

Create an error bar chart of the mean of the adjusted values that you have just made (Real_Adjusted and Picture_Adjusted).
The completed graph editor window should look like this:

162

Using the spiderRM.sav data, compute the differences between the picture and real condition and check the assumption of normality for these differences.
First compute the differences using the compute function:

163

Next, use

to get some plots and the KS test:

The output shows that the distribution of differences is not significantly different from normal, D(12) = 0.13, p > .05. The QQ plot also shows that the quantiles fall pretty much on the diagonal line (indicating normality). As such, it looks as though we can assume that our differences are normal and that, therefore, the sampling distribution of these differences is normal too. Happy days!

164

Additional Material

Labcoat Lenis Real Research: You dont have to be mad here, but it helps

Board, B. J., & Fritzon, K. (2005). Psychology, Crime & Law, 11, 1732.

165

In the UK you often see the humorous slogan You dont have to be mad to work here, but it helps stuck up in workplaces. Well, Board and Fritzon (2005) took this a step further by measuring whether 39 senior business managers and chief executives from leading UK companies had personality disorders (PDs). They gave them The Minnesota Multiphasic Personality Inventory Scales for DSM III Personality Disorders (MMPI-PD), which is a well validated measure of 11 personality disorders: Histrionic, Narcissistic, Antisocial,

Borderline, Dependent, Compulsive, Passiveaggressive, Paranoid, Schizotypal, Schizoid and Avoidant. They needed a comparison group, and what better one to choose than 317 legally classified psychopaths at Broadmoor Hospital (a famous high-security psychiatric hospital in the UK). The authors report the means and SDs for these two groups in Table 2 of their paper. Using these values and the syntax file Independent t from means.sps we can run ttests on these means. The data from Board and Fritzons (2005) Table 2 are in the file Board and Fritzon 2005.sav. Use this file and the syntax file to run t-tests to see whether managers score higher on personality disorder questionnaires than legally classified psychopaths. Report these results. What do you conclude?

The data looks like this:

166

The columns represent the following: Outcome: a string variable that tells us which personality disorder the numbers in each row relate to. X1: Mean of the managers group. X2: Mean of the psychopaths group. sd1: Standard deviation of the managers group. sd2: Standard deviation of the psychopaths group. n1: The number of managers tested. n2: The number of psychopaths tested. The syntax file looks like this:

167

We can run the syntax by selecting The output looks like this:

We can report that managers scored significantly higher than psychopaths on histrionic personality disorder, t(354) = 7.18, p < .001, d = 1.22. There were no significant differences between groups on Narscissistic personality disorder, t(354) = 1.41, p > .05, d = 0.24 , or Compulsive personality disorder, t(354) = 0.77, p > .05, d = 0.13. On all

168

other measures, psychopaths scored significantly higher than managers: Antisocial personality disorder, t(354) = 5.23, p < .001, d = 0.89; Borderline personality disorder, t(354) = 10.01, p < .001, d = 1.70; Dependent personality disorder, t(354) = 1.67; Passive-aggressive personality disorder, t(354) = 9.80,

p < .001, d = .001, d =

3.83, p < 1.48;

0.65; Paranoid personality disorder, t(354) = 8.73, p < .001, d =

Schizotypal personality disorder, t(354) = personality disorder, t(354) =

10.76, p < .001, d = 1.83; Schizoid 1.39; Avoidant personality

8.18, p < .001, d = 1.07.

disorder, t(354) = 6.31, p < .001, d =

The results show the presence of elements of PD in the senior business manager sample, especially those most associated with psychopathic PD. The senior business manager group showed significantly higher levels of traits associated with histrionic PD than psychopaths. They also did not significantly differ from psychopaths in narcissistic and compulsive PD traits. These findings could be an issue of power (effects were not detected but are present). The effect sizes d can help us out here and these are quite small (0.24 and 0.13), which can give us confidence that there really isnt a difference between psychopaths and managers on these traits. Broad and Fritzon (2005) conclude that: At a descriptive level this translates to: superficial charm, insincerity, egocentricity, manipulativeness

(histrionic), grandiosity, lack of empathy, exploitativeness, independence (narcissistic), perfectionism, excessive devotion to work, rigidity, stubbornness, and dictatorial

tendencies (compulsive). Conversely, the senior business manager group is less likely to demonstrate physical aggression, consistent irresponsibility with work and finances, lack of remorse (antisocial), impulsivity, suicidal gestures, affective instability (borderline), mistrust (paranoid), and hostile defiance alternated with contrition (passive/aggressive). And these people are in charge of large companies like Sage Publications Ltd. Hmm, suddenly a lot things make sense.

169

Chapter 10

Self-Test

To illustrate exactly what is going on I have created a file called


dummy.sav. This file contains the Viagra data but with two additional

variables (dummy1 and dummy2) that specify to which group a data point belongs (as in Table 10.2). Access this file and run multiple regression analysis using libido as the outcome and dummy1 and
dummy2 as the predictors. If youre stuck on how to run the regression

then read Chapter 7 again (see, these chapters are ordered for a reason)!
The dialog box for the regression should look like this:

To illustrate these principles, I have created a file called


Contrast.sav in which the Viagra data are coded using the

contrast coding scheme used in this section. Run multiple

170

regression analyses on these data using libido as the outcome and using dummy1 and dummy2 as the predictor variables (leave all default options).
Your completed regression dialog box should look like this:

Produce a line chart with error bars for the Viagra data.

Your complete Chart Builder should look like this:

171

Additional Material
Oliver Twisted: Please Sir, Can I Have Some More Levenes Test?

Liar! Liar! Pants on fire, screams Oliver his cheeks red and eyes about to explode. You promised to explain Levenes test properly and you havent, you spatula head. True enough, Oliver, I do have a spatula for a head. I also have a very nifty little demonstration of Levenes test in the additional material for this chapter on the companion website. It will tell you more than you could possibly want to know. Lets go fry an egg

172

Levenes test is basically an ANOVA conducted on the absolute differences between the observed data and the mean from which the data came. To see what I mean lets do a sort of manual Levenes test on the Viagra data. First we need to create a new variable called difference (short for Difference from group mean), which is each score subtracted from the mean of the group to which that score belongs. Remember that means for the placebo, lowdose and highdose groups were 2.2, 3.2 and 5 respectively, and the groups were coded 1, 2 and 3. We can compute this new variable using syntax:

IF (dose = 1) Difference=libido - 2.2. IF (dose = 2) Difference=libido - 3.2. IF (dose = 3) Difference=libido - 5. VARIABLE LABELS Difference 'Difference from Group Mean'. EXECUTE.

The first line just says that if dose = 1 (i.e. placebo) then the difference is the value of libido minus 2.2 (the mean of the placebo group). The next two lines do the same thing for the low- and highdose groups. The resulting data look like this:

173

Note that for person 1, the difference score is 3

2.2 = 0.8, for person 2 it is 2

2.2 =

0.20. As we move into the low-dose group we subtract the mean of that group, so person 6s difference is 5 3.2 = 1.8, person 7 is 2 3.2 = 1.20. In the high dosegroup, the 5 = 2 and so on. Think about

group mean is 5, so for person 11 we get a difference of 7

what these differences are; they are deviations from the mean, the same deviations that we calculate when we compute the sums of squares and variance and standard deviation. They represent variation from the mean. When we compute the variance we square the values to get rid of the plus and minus signs (otherwise the positive and negative deviations will cancel out). Levenes test doesnt do this (because we dont want to change the units of measurement by squaring the values), but instead simply takes the absolute values; that is, it pretends that all of the deviations are positive. To get the absolute values of these differences (i.e. we need to make them all positive values), again we can do this with syntax: Compute Difference = abs(Difference).

174

VARIABLE LABELS Difference 'Absolute Difference from Group Mean'. EXECUTE.

The first line just changes the variable difference to be the absolute value of itself. The second line renames the variable to reflect the fact that it now contains absolute values. The data now look like this:

Note that the difference scores are the same magnitude, its just that the minus signs have gone. These values still represent deviations from the mean, or variance, we just now dont have the problem of positive and negative deviations cancelling each other out. Now, using what you learnt in the book conduct a one-way ANOVA on these difference scores: dose is the independent variable and diff is the dependent variable (dont select any special options, just run a basic analysis). The main dialog box should look like this:

175

Youll find that the F-ratio for this analysis is 0.092, which is significant at p = 0.913; that is, the same values as Levenes test in the book!

Levenes test is, therefore, testing whether the average absolute deviation from the mean is the same in the three groups. Clever, eh?

Labcoat Lenis Real Research 10 .1: Scraping the barrel?


Gallup, G.G.J. et al. (2003). Evolution and Human Behavior, 24, 277289.

Evolution has endowed us with many beautiful things (cats, dolphins, the Great Barrier

176

Reef, etc.) all selected to fit their ecological niche. Given evolutions seemingly limitless capacity to produce beauty, its something of a wonder how it managed to produce such a monstrostity as the human penis. One theory is that the penis evolved into the shape that it is because of sperm competition. Specifically, the human penis has an unusually large glans (the bell-end as its affectionately known) compared to other primates, and this may have evolved so that the penis can displace seminal fluid from other males by scooping it out during intercourse. To put this idea to the test Gordon Gallup and his colleagues came up with an ingenious study (Gallup et al., 2003). Armed with various female masturbatory devices from Hollywood Exotic Novelties, an artificial vagina from California Exotic Novelties, and some water and cornstarch to make fake sperm, they loaded the artificial vagina with 2.6 ml of fake sperm and inserted one of three female sex toys into it before withdrawing it. Over several trials, three different female sex toys were used: a control phallus that had no coronal ridge (i.e. no bell-end), a phallus with a minimal coronal ridge (small bell-end) and a phallus with a coronal ridge. They measured sperm displacement as a percentage using the following equation (included here because it is more interesting than all of the other equations in this book):

weight of vagina with semen weight of vagina following intertion and removal of phallus weight of vagina with semen weight of empty vagina
As such, 100% means that all of the sperm was displaced by the phallus, and 0% means that none of the sperm was displaced. If the human penis evolved as a sperm displacement device then we predict: (1) that having a bell-end will displace more sperm than not; and (2) the phallus with the larger coronal ridge will displace more sperm than the phallus with the minimal coronal ridge. The conditions are ordered (no ridge, minimal ridge, normal ridge) so we might also predict a linear trend. The data can be found in the file Gallup et al.sav. Conduct a one-way ANOVA with planned comparisons to test the two hypotheses

177

above.

OK, lets do the graph first. There are two variables in the data editor: Phallus (the independent variable that has three levels: no ridge, minimal ridge and normal ridge) and Displacement (the dependent variable, the percentage of sperm displaced). The graph should therefore plot Phallus on the x-axis and Displacement on the Y-axis. The completed dialog box should look like this:

178

The final graph looks like this (I have edited mine, you can edit yours too to get some practice):

This graph shows that having a coronal ridge results in more sperm displacement than not having one. The size of ridge made very little difference. For the ANOVA the dialog box should look like this:

To test our hypotheses we need to enter the following codes:

Group

179

No Ridge (Control) Contrast 1 Contrast 2 0 2

Minimal Ridge 1 1

Coronal Ridge 1 1

Contrast 1 tests hypothesis 1: (1) that having a bell-end will displace more sperm than not. To test this we compare the two conditions with a ridge against the control condition (no ridge). So we compare chunk 1 (no ridge) to chunk 2 (minimal ridge, coronal ridge). The numbers assigned to the groups are the number of groups in the opposite chunk, and then we randomly assigned one chunk to be a negative value (the codes 2 1 1 would work fine as well).

Contrast 2 tests hypothesis 2: (2) the phallus with the larger coronal ridge will displace more sperm than the phallus with the minimal coronal ridge. First we get rid of the control phallus by assigning a code of 0; next we compare chunk 1 (minimal ridge) to chunk 2 (coronal ridge). The numbers assigned to the groups are the number of groups in the opposite chunk, and then we randomly assigned one chunk to be a negative value (the codes 0 1 1 would work fine as well). We enter these codes into SPSS as below:

180

We should also ask for homogeneity tests and corrections:

This tells us that Levenes test is not significant, F(2, 12) = 1.12, p > .05, so we can assume that variances are equal.

181

The main ANOVA tells us that there was a significant effect of the type of phallus, F(2, 12) = 41.56, p <.001. (This is exactly the same result as reported in the paper on page 280.) There is also a significant linear trend, F(1, 12) = 62.47, p > .001, indicating that more sperm was displaced as the ridge increased (however, note from the graph that this effect reflects the increase in displacement as we go from no ridge to having a ridge; there is no extra increase from minimal ridge to coronal ridge).

This table tells us that we entered our weights correctly;

Contrast 1 tells us that hypothesis 1 is supported: having some kind of ridge led to greater sperm displacement than not having a ridge, t(12) = 9.12, p < .001. Contrast 2 shows that hypothesis 2 is not supported: the amount of sperm displaced by the normal coronal ridge

182

was not significantly different from the amount displaced by a minimal coronal ridge, t(12) = 0.02, p = .99.

Chapter 11

Self-Test Answers

Use SPSS to find out the mean and standard deviation of both the participants libido and that of their partner in the three groups.

The easiest way to get these values is to select because this allows us to split the analysis by group; however, this is not the only way (we could, for example, split the file and then run the descriptive command, we can also select ; although we dont use this command in the book it is fairly self-evident how to use it!). Complete the dialog box as follows and youll get a beautiful (?) table of descriptive statistics for both variables, split by each group:

183

Conduct an ANOVA to test whether partners libido (our covariate) is independent of the dose of Viagra (our independent variable).

We can do this analysis by selecting also ( do the analysis using the same dialog box that we use

, but we can for ANCOVA

). If we do the latter then we can follow the example

in the chapter but simply exclude the covariate. Therefore, the completed dialog box would look like this:

184

Run a one-way ANOVA to see whether the three groups differ in their levels of libido.

We can do this analysis by selecting also ( do the analysis using the same dialog box that we use

, but we can for ANCOVA

). If we do the latter then we can follow the example

in the chapter but simply exclude the covariate. Therefore, the completed dialog box would look like this:

Why do you think that the results of the post hoc test differ to the contrasts for the comparison of the low-dose and placebo group?

This contradiction might result from a loss of power in the post hoc tests (remember that planned comparisons have greater power to detect effects than post hoc procedures).

185

However, there could be other reasons why these comparisons are non-significant and we should be very cautious in our interpretation of the significant ANCOVA and subsequent comparisons.

Add two dummy variables to the file ViagraCovariate.sav that compare the low dose to the placebo (Low_Placebo) and the high dose to the placebo (High_Placebo). If you get stuck then download
ViagraCovariateDummy.sav.

Run a hierarchical regression analysis with Libido as the outcome. In the first block enter partners libido (Partner_Libido) as a predictor, and then in the second block enter both dummy variables (forced entry).

To get to the main regression dialog box select

. Select

the outcome variable (Libido) and then drag it to the box labelled Dependent or click on . To specify the predictor variable for the first block we select Partner_Libido and drag it to the box labelled Independent(s) or click on . Underneath the Independent(s) box,

there is a drop-down menu for specifying the Method of regression. The default option is forced entry, and this is the option we want. Having specified the first block in the hierarchy, we need to move on to to the second. To tell the computer that you want to specify a new block of predictors you must click on . This process clears the Independent(s) box so that you can enter the new predictors (you should also note that above this box it now reads Block 2 of 2 indicating that you are in the second block of the two that you have so far specified). The second block must contain both of the dummy variables so you should click on Low_Placebo and High_Placebo in the variables list and drag them to the Independent(s) box by clicking on

186

. We also want to leave the method of regression set to Enter. The dialog boxes for the two stages in the hierarchy are shown below:

We just want to run a basic analysis, so we can leave all of the default options as they are and click on .

Rerun the ANCOVA but select

. Do the values of partial eta

squared match the ones we have just calculated?

You should get the following output:

187

This table is the same as the main ANCOVA that we did in the chapter except that there is an extra column at the end with the values of partial eta squared. For Dose, partial eta square is .24, and for Partner_Libido it is .16, both of which are the same as we calculated by hand in the chapter.

Additional Material

Labcoat Lenis Real Research: Space Invaders


Muris, P. et al. (2008). Child Psychiatry and Human Development, XX, XXXXXX.

Anxious people tend to interpret ambiguous information in a negative way. For example, being highly anxious myself, if I overheard a student saying Andy Fields lectures are really different I would assume that different meant rubbish, but it could also mean refreshing or innovative. One current mystery is how these interpretational biases develop in children. Peter Muris and his colleagues addressed this issue in an ingenious study. Children did a computerized task in which they imagined that they were astronauts who had discovered a new planet. Although the planet was similar to Earth, some things were different. They were given some scenarios about their time on the planet (e.g. On the street, you encounter a spaceman. He has a sort of toy handgun and he fires at you ) and the child had to decide which of two outcomes occurred. One outcome was positive (You are laughing: it is a water pistol and the weather is fine anyway) and the other negative (Oops, this hurts! The pistol produces a red beam which burns your skin!). After each response the child was told whether their choice was correct. Half of the children were

188

always told that the negative interpretation was correct, and the reminder were always told that the positive interpretation was correct. As such, over 30 scenarios children were trained to interpret their experiences on the planet as negative or positive. Muris et al. then gave children a standard measure of interpretational biases in everyday life to see whether the training had created a bias to interpret things negatively. In doing so, they could ascertain whether children learn interpretational biases through feedback (e.g. from parents) about how to disambiguate ambiguous situations. The data from this study are in the file Muris et al (2008).sav. The main independent variable is Training (positive or negative) and the outcome variable was the childs interpretational bias score (Interpretational_Bias)a high score reflects a tendency to interpret situations negatively. In a study such as this, it is important to factor in the Age and Gender of the child and also their natural anxiety level (which the researches measured with a standard questionnaire of child anxiety called the SCARED). Labcoat Leni wants you to carry out a one-way ANCOVA on these data to see whether Training significantly affected childrens Interpretational_Bias using Age Gender, and SCARED as covariates. What can you conclude? Answers are in the additional material for this website (or look at pp.475476 in the original article).

To

run

this

analysis

we

need

to

access

the

main

dialog

box

by

selecting

. Select Interpretational_Bias and drag this variable to the box labelled Dependent Variable or click on . Select Training (i.e. the type of

training that the child had) and drag it to the box labelled Fixed Factor(s) and then select Gender, Age and SCARED (by holding down Ctrl while you click on these variables) and

189

drag these variables to the box labelled Covariate(s). The finished dialog box should look like this:

In the chapter we looked at how to select contrasts but because our main predictor variable (the type of training) has only two levels (positive or negative) we dont need contrasts: the main effect of this variable can only reflect differences between the two types of training. The main output is:

190

First, notice that Levenes test is non-significant, F(1, 68) = 1.09, p > .05, which tells us that the variance in bias scores was fairly similar in the two training groups. In other words, the assumption of homogeneity of variance has been met. In the main table, we can see that even after partialling out the effects of age, gender and natural anxiety, the training had a significant effect on the subsequent bias score, F(1, 65) = 13.43. The means in the table tell us that interpretational biases were stronger (higher) after negative training. This result is as expected. It seems then that giving children feedback that tells them to interpret ambiguous situations negatively does induce an interpretational bias that persists into everyday situations, which is an important step towards understanding how these biases develop. In terms of the covariates, age did not influence the acquisition of interpretational biases. However, anxiety and gender did. If we look at the parameter estimates table, we can use the beta values to interpret these effects. For anxiety (SCARED), b = 2.01, which reflects a positive relationship. Therefore, as anxiety increases, the interpretational bias increases also (this is what you would expect because anxious children would be more likely to naturally interpret ambiguous situations in a negative way). If you draw a scatterplot of the relationship between SCARED and Interpretational_Bias youll see a very nice positive relationship. For Gender, b = 26.12, which again is positive but to interpret this we need to know how the children were coded in the data editor. Boys were coded as 1 and girls as 2. Therefore, as a child changes (not literally) from a boy to a girl, their interpretational biases increase. In other words, girls show a stronger natural tendency to interpret

191

ambiguous situations negatively. This is consistent with the anxiety literature, which shows that females are more likely to have anxiety disorders. One important thing to remember is that although anxiety and gender naturally affected whether children interpreted ambiguous situations negatively, the training (the experiences on the alien planet) had an effect above and beyond these natural tendencies (in other words, the effects of training cannot be explained by gender of natural anxiety levels in the sample). Have a look at the original article to see how Muris et al. reported the results of this analysis this can help you to see how you can report your own data from an ANCOVA. (One bit of good practice that you should note is that they report effect sizes from their analysis as you will see from the book chapter this is an excellent thing to do.)

Chapter 12

Self-Test Answers

192

Use the Chart Builder to plot a line graph (with error bars) of the attractiveness of the date with alcohol consumption on the x-axis and differentcoloured lines to represent males and females.

To do a multiple line chart for means that are independent (i.e. have come from different groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Attractiveness from the variable list and drag it into from the variable list and drag it into drag it into ; select Alcohol

; finally select the gender variable and

. This will mean that lines representing males and females will be

displayed in different colours. Select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph.

193

The resulting graph can be found in the book chapter.

Plot error bar graphs of the main effects of alcohol and gender.

To do an error bar chart click on the bar chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Attractiveness from the variable list and drag it into the variable list and drag it into and click on , then select Alcohol from

. Select error bars in the properties dialog box to produce the graph.

to apply them to the Chart Builder. Click on

194

To do the graph of the gender main effect just drag gender into the

drop

zone to replace alcohol. The respective dialog boxes are shown below. The completed (edited) graphs are in the book.

195

The file GogglesRegression.sav contains the dummy variables used in this example, and just to prove that all of this works, use this file and run a multiple regression on the data.

To get to the main regression dialog box select

. Select

the outcome variable (Attractiveness) and then drag it to the box labelled Dependent or click on . We want to specify both predictors and their interaction in the same block. To

specify the predictor variables, select Gender, Alcohol and the Interaction and drag them to the box labelled Independent(s) or click on . Underneath the Independent(s)

box, there is a drop-down menu for specifying the Method of regression. The default option

196

is forced entry, and this is the option we want. We just want to run a basic analysis, so we can leave all of the default options as they are and click on .

Additional Material Oliver Twisted: Please Sir, Can I Customize My Model?

My friend told me that there are different types of sums of squares complains Oliver with an air of impressive authority, why havent you told us about them? Is it because you have a microbe for a brain? No, its not Oliver, its because everyone but you will find this very tedious. If you want to find out more about what the button does, and the different types of sums of squares that

can be used in ANOVA, then the additional material will tell you.

197

By default SPSS conducts a full factorial analysis (i.e. it includes all of the main effects and interactions of all independent variables specified in the main dialog box). However, there may be times when you want to customize the model that you use to test for certain things. To access the model dialog box, click on in

the main dialog box. You will notice that, by default, the full factorial model is selected. Even with this selected, there is an option at the bottom to change the types of sums of squares that are used in the analysis. Although we have learnt about sums of squares and what they represent, I havent talked about different ways of calculating sums of squares. It isnt necessary to understand the computation of the different forms of sums of squares, but it is important that you know the uses of some of the different types. By default, SPSS uses Type III sums of squares, which have the advantage that they are invariant to the cell frequencies. As such, they can be used with both balanced and unbalanced (i.e. different numbers of participants in different groups) designs, which is why they are the default option. Type IV sums of squares are like Type III except that they can be used with data in which there are missing values. So, if you have any missing data in your design, you should change the sums of squares to Type IV. To customize a model, click on to activate the dialog box. The variables specified in

the main dialog box will be listed on the left-hand side. You can select one, or several, variables from this list and transfer them to the box labelled Model as either main effects or interactions. By default, SPSS transfers variables as interaction terms, but there are several options that allow you to enter main effects, or all two-way, three-way or four-way interactions. These options save you the trouble of having to select lots of combinations of variables (because, for example, you can select three variables, transfer them as all twoway interactions and it will create all three combinations of variables for you). Hence, you could select Gender and Alcohol (you can select both of them at the same time by

198

holding down Ctrl). Then, click on the dropdown menu and change it to selected this, click on

. Having

to move the main effects of Gender and Alcohol to the box

labelled Model. Next we could specify the interaction term. To do this, select Gender and Alcohol simultaneously (by holding down the Ctrl key while you click on the two variables), then select in the dropdown list and click on . This action

moves the interaction of Gender and Alcohol to the box labelled Model. The finished dialog box should look like that below. Having specified our two main effects and the interaction term, click on click on to return to the main dialog box and then

to run the analysis. Although model selection has important uses it is likely

that youd want to run the full factorial analysis on most occasions and so wouldnt customize your model.

Oliver Twisted: Please Sir, Can I Have Some More Contrasts?

199

I dont want to use standard contrasts, sulks Oliver as he stamps his feet on the floor, they smell of rotting cabbage. I think actually, Oliver, the stench of rotting cabbage is probably because you stood your Dickensian self under a window when someone emptied their toilet bucket into the street. Nevertheless, I do get asked a fair bit about how to do contrasts with syntax and Im a complete masochist so Ive prepared a fairly detailed guide in the additional material for this chapter. If you want to know more then have a look at this additional material.

Defining Contrasts with Syntax

Why Do We Need To Use Syntax?


In Chapters 12, 13 and 14 of the book we used SPSSs builtin contrast functions to compare various groups after conducting ANOVA. These special contrasts (described in Chapter 10, Table 10.6) cover many situations, but in more complex designs there will be times when you want to do contrasts that simply cant be done using SPSSs built in contrasts. Unlike one-way ANOVA, there is no way in factorial designs to define contrast codes through the Windows dialog boxes. However, SPSS can do these contrasts if you define them using syntax.

An Example
Imagine a clinical psychologist wanted to see the effects of a new antidepressant drug called Cheerup. He took 50 people suffering from clinical depression and randomly assigned them to one of five groups. The first group was a waiting list control group (i.e.

200

they were people assigned to the waiting list who were not treated during the study), the second took a placebo tablet (i.e. they were told they were being given an antidepressant drug but actually the pills contained sugar and no active agents), the third group took a well-established SSRI antidepressant called Seroxat (Paxil to American readers), the fourth group was given a well-established SNRI antidepressant called Effexor,3 the final group was given the new drug, Cheerup. Levels of depression were measured before and after two months on the various treatments, and ranged from 0 = as happy as a spring lamb to 20 = pass me the noose. The data are in the file Depression.sav. The design of this study is a two-way mixed design. There are two independent variables: treatment (no treatment, placebo, Seroxat, Effexor or Cheerup) and time (before or after treatment). Treatment is measured with different participants (and so is between-group) and time is, obviously, measured using the same participants (and so is repeatedmeasures). Hence, the ANOVA we want to use is a 5 2 two-way ANOVA. Now, we want to do some contrasts. Imagine we have the following hypotheses: 1. Any treatment will be better than no treatment. 2. Drug treatments will be better than the placebo. 3. Our new drug, Cheerup, will be better than old-style antidepressants. 4. The old-style antidepressants will not differ in their effectiveness. We have to code these various hypotheses as we did in Chapter 10. The first contrast involves comparing the notreatment condition to all other groups. Therefore, the first step is to chunk these variables, and then assign a positive weight to one chunk and a negative weight to the other chunk.

201

Chunk 1: No Treatment

Chunk 2: Placebo Seroxat Effexor Cheerup

Sign of Weight

Having done that, we need to assign a numeric value to the groups in each chunk. As I mentioned in Chapter 8, the easiest way to do this is just to assign a value equal to the number of groups in the opposite chunk. Therefore, the value for any group in chunk 1 will be the same as the number of groups in chunk 2 (in this case 4). Likewise, the value for any groups in chunk 2 will be the same as the number of groups in chunk 1 (in this case 1). So, we get the following codes:

Chunk 1: No Treatment

Chunk 2: Placebo Seroxat Effexor Cheerup

+ +1

Sign of Weight Value of Weight

The second contrast requires us to compare the placebo group to all of the drug groups. Again, we chunk our groups accordingly, assign one chunk a negative sign and the other a positive, and then assign a weight on the basis of the number of groups in the opposite chunk. We must also remember to give the notreatment group a weight of 0 because theyre not involved in the contrast.

202

Chunk 1: Placebo

Chunk 2: Seroxat Effexor Cheerup

Not Included No Treatment

+ +1

Sign of Weight Value of Weight 0

The third contrast requires us to compare the new drug (Cheerup) to the old drugs (Seroxat and Effexor). Again, we chunk our groups accordingly, assign one chunk a negative sign and the other a positive, and then assign a weight on the basis of the number of groups in the opposite chunk. We must also remember to give the notreatment and placebo groups a weight of 0 because theyre not involved in the contrast.

Chunk 1: Cheerup

Chunk 2: Seroxat Effexor

Not Included No Treatment Placebo Sign of Weight Value of Weight 0

+ +1

The final contrast requires us to compare the two old drugs. Again, we chunk our groups accordingly, assign one chunk a negative sign and the other a positive, and then assign a weight on the basis of the number of groups in the opposite chunk. We must also give the no-treatment, placebo and Cheerup groups a weight of 0 because theyre not involved in the contrast.

203

Chunk 1: Effexor

Chunk 2: Seroxat

Not Included No Treatment Placebo Cheerup

+ +1

Sign of Weight Value of Weight 0

We can summarize these codes in the following table:

No Treatment Contrast 1 Contrast 2 Contrast 3 Contrast 4 -4 0 0 0

Placebo

Seroxat

Effexor

Cheerup

1 -3 0 0

1 1 1 1

1 1 1 -1

1 1 -2 0

These are the codes that we need to enter into SPSS to do the contrasts that wed like to do.

Entering the Contrasts Using Syntax


To enter these contrasts using syntax we have to first open a syntax window (see Chapter 2 of the book). Having done that we have to type the following commands:

MANOVA before after BY treat(0 4)

204

This initializes the ANOVA command in SPSS. The second line specifies the variables in the data editor. The first two words before and after are the repeated-measures variables (and these words are the words used in the data editor). Anything after BY is a between-group measure and so needs to be followed by brackets within which the minimum and maximum values of the coding variable are specified. I called the between-group variable treat, and I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup. Therefore, the minimum and maximum codes were 0 and 4. So these two lines tell SPSS to start the ANOVA procedure, that there are two repeated-measures variables called before and after, and that there is a between-group variable called treat that has a minimum code of 0 and a maximum of 4.

/WSFACTORS time (2)

The /WSFACTORS command allows us to specify any repeated-measures variables. SPSS already knows that there are two variables called before and after, but it doesnt know how to treat these variables. This command tells SPSS to create a repeated-measures variable called time that has two levels (the number in brackets). SPSS then looks to the variables specified before and assigns the first one (before in this case) to be the first level of time, and then assigns the second one (in this case after) to be the second level of time. /CONTRAST (time)=special(1 1, 1 -1)

This is used to specify the contrasts for the first variable. The /CONTRAST is used to specify any contrast. Its always followed by the name of the variable that you want to do a contrast on in brackets. We have two variables (time and treat) and in this first contrast we

205

want to specify a contrast for time. Time only has two levels and so all we want to do is to tell SPSS to compare these two levels (which actually it will do by default but I want you to get some practice in!). What we write after the equals sign defines the contrast, so we could write the name of one of the standard contrasts such as Helmert, but because we want to specify our own contrast we use the word special. Special should always be followed by brackets, and inside those brackets are your contrast codes. Codes for different contrasts are separated using a comma, and within a contrast, codes for different groups are separated using a space. The first contrast should always be one that defines a baseline for all other contrasts and that is one that codes all groups with a 1. Therefore, because we have two levels of time, we just write 1 1, which tells SPSS that the first contrast should be one in which both before and after are given a code of 1. The comma tells SPSS that a new contrast follows and this second contrast has been defined as 1 -1 and this tells SPSS that in this second contrast we want to give before a code of 1, and after a code of -1. Note that the codes you write in the brackets are assigned to variables in the order that those variables are entered into the SPSS syntax, so because we originally wrote before after BY treat(0 4) SPSS assigns the 1 to before and -1 to after; if wed originally wrote after before BY treat(0 4) then SPSS would have assigned them the opposite way round: the 1 to after and -1 to before.

/CONTRAST (treat)=special (1 1 1 1 1, -4 1 1 1 1, 0 -3 1 1 1, 0 0 1 1 -2, 0 0 1 -1 0)

This is used to specify the contrasts for the second variable. This time the /CONTRAST command is followed by the name of the second variable (treat) variable. Treat has five

206

levels and weve already worked out four different contrasts that we want to do. Again we use the word special after the equals sign and specify our coding values within the brackets. As before, codes for different contrasts are separated using a comma and, within a contrast, codes for different groups are separated using a space. Also, as before, the first contrast should always be one that defines a baseline for all other contrasts and that is one that codes all groups with a 1. Therefore, because we have five levels of time, we just write 1 1 1 1 1, which tells SPSS that the first contrast should be one in which all five groups are given a code of 1. The comma tells SPSS that a new contrast follows and this second contrast has been defined as -4 1 1 1 1, and this tells SPSS that in this second contrast we want to give the first group a code of -4 and all subsequent groups codes of 1. How does SPSS decide what the first group is? It uses the coding variable in the data editor and orders the groups in the same order as the coding variable. Therefore, because I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup, this first contrast gives the notreatment group a code of -4 and all subsequent groups codes of 1. The comma again tells SPSS that, having done this, there is another contrast to follow and this contrast has been defined as 0 -3 1 1 1, and this tells SPSS that in this contrast we want to give the first group a code of 0, the second group a code of -3 and all subsequent groups codes of 1. Again, because I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup, this contrast gives the no treatment group a code of 0, the placebo group a code of -3 and all subsequent groups codes of 1. The comma again tells SPSS that having done this, there is another contrast to follow and this contrast has been defined as 0 0 1 1 -2, and this tells SPSS that in this contrast we want to give the first two groups a code of 0, the third and fourth groups a code of 1 and the final group a code of -2. Again, because I coded the groups as 0 = no treatment, 1 = placebo, 2 = Seroxat, 3 = Effexor, 4 = Cheerup, this contrast gives the notreatment and placebo groups a code of 0, the Effexor and Seroxat groups a code of -3 and the Cheerup group a code of 1. The comma again tells SPSS that there is yet another

207

contrast to follow and this contrast has been defined as 0 0 1 -1 0, and this tells SPSS that in this contrast we want to give the first, second and last groups a code of 0, the third group a code of 1 and the fourth group a code of -1. This contrast gives the no-treatment, placebo and Cheerup groups a code of 0, the Seroxat group a code of 1 and the Effexor group a code of -1. As such, this one line of text has defined the four contrasts that we want to do.

/CINTERVAL JOINT(.95) MULTIVARIATE(BONFER)

This line defines the type of confidence intervals that you want to do for your contrasts. I recommend the Bonferroni option, but if you delve into the SPSS syntax guide you can find others.

/METHOD UNIQUE /ERROR WITHIN+RESIDUAL /PRINT TRANSFORM HOMOGENEITY(BARTLETT COCHRAN BOXM) SIGNIF( UNIV MULT AVERF HF GG ) PARAM( ESTIM EFSIZE).

These lines of syntax specify various things (that may or may not be useful) such as a transformation matrix (TRANSFORM), which isnt at all necessary here but is useful if

208

youve used SPSSs built-in contrasts, Homogeneity tests (HOMOGENEITY(BARTLETT COCHRAN BOXM)), the main ANOVA and HuynhFeldt and GreenhouseGeisser corrections which we dont actually need in this example (SIGNIF( UNIV MULT AVERF HF GG )) and parameter estimates and effect size estimates for the contrasts weve specified (PARAM( ESTIM EFSIZE)). So, the whole syntax will look like this:

MANOVA before after BY treat(0 4) /WSFACTORS time (2) /CONTRAST (time)=special(1 1, 1 -1) /CONTRAST (treat)=special (1 1 1 1 1, -4 1 1 1 1, 0 -3 1 1 1, 0 0 1 1 -2, 0 0 1 -1 0) /CINTERVAL JOINT(.95) MULTIVARIATE(BONFER) /METHOD UNIQUE /ERROR WITHIN+RESIDUAL /PRINT TRANSFORM HOMOGENEITY(BARTLETT COCHRAN BOXM) SIGNIF( UNIV MULT AVERF HF GG ) PARAM( ESTIM EFSIZE).

Its very important to remember the full stop at the end! This syntax is in the file DepressionSyntax.sps as well, in case your typing goes wrong!
Error Bars show 95.0% Cl of Mean

Output From The Contrasts


Mean Depression Levels
15.00

10.00

209
5.00

0.00 No T rea tmen t Placebo S eroxat (Pa xil) E ffexor Che eru p

Treatment

The output you get is in the form of text (no nice pretty tables) and to interpret it you have to remember the contrasts you specified! Ill run you through the main highlights of this example: The first bit of the output will show the homogeneity tests (which should all be nonsignificant but beware of Boxs test because it tends to be inaccurate). The first important part is the main effect of the variable treat. First theres an ANOVA summary table like those youve come across before (if youve read Chapters 811). This tells us that theres no significant main effect of the type of treatment, F(4, 45) = 2.01, p > .05. This tells us that if you ignore the time at which depression was measured then the levels of depression were about the same across the treatment groups. Of course, levels of depression should be the same before treatment, and so this isnt a surprising result (because it averages across scores before and after treatment. The graph shows that in fact levels of depression are relatively similar across groups.

******Analysis

of

V a r i a n c e -- design

1******

Tests of Between-Subjects Effects.

Tests of Significance for T1 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F

WITHIN+RESIDUAL TREAT

359.95 64.30 4

45 16.08

8.00 2.01 .109

210

------------------------------------Estimates for T1 --- Joint univariate .9500 BONFERRONI confidence intervals

TREAT

Parameter

Coeff. Std. Err.

t-Value

Sig. t Lower -95% CL- Upper

2 3 4 5

-7.7781746 3.53553391 3.74766594 -.21213203

3.99972 3.09817 2.19074 1.26482

-1.94468 1.14117 1.71069 -.16772

.05808 -18.18578 .25984 .09403 .86756 -4.52617 -1.95282 -3.50331

2.62944 11.59723 9.44815 3.07904

Error Bars show 95.0% Cl of Mean

Parameter

ETA Sq.
Depression Levels

20.0 0

15.0 0

2 3 4 5

.07752 .02813 .06106 .00062

10.0 0

5.00

0.00 B efo re T reatme nt A fte r Treatm ent

Time

-------------------------------------

211

This main effect is followed by some contrasts, but we dont need to look at these because the main effect was non-significant. However, just to tell you what they are, parameter 2 is our first contrast (no treatment vs. the rest) and as you can see this is almost significant (p is just above 0.05). Parameter 3 is our second contrast (placebo vs. the rest) and this is non-significant. Parameter 4 is our third contrast (Cheerup vs. Effexor and Seroxat), and again this is almost significant. Parameter 5 is our last contrast (Seroxat vs. Effexor) and this is very non-significant. However, these contrasts all ignore the effect of time and so arent really what were interested in. The next part that were interested in is the within-subject effects, and this involves the main effect of time and the interaction of time and treatment. First theres an ANOVA summary table as before. This tells us that theres a significant main effect of the time, F(1, 45) = 43.02, p < .001. This tells us that if you ignore the type of treatment, there was a significant difference between depression levels before and after treatment. A quick look at the means reveals that depression levels were significantly lower after treatment. Below the ANOVA table is a parameter estimate for the effect of time. As there are only two levels of time, this represents the difference in depression levels before and after treatment. No other contrasts are possible.

******Analysis

of

V a r i a n c e -- design

1******

Tests involving 'TIME' Within-Subject Effect.

212

Tests of Significance for T2 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F

WITHIN+RESIDUAL TIME TREAT BY TIME 306.25

320.35 1

45 306.25 4

7.12 43.02 4.42 .000 .004

125.90

31.47

------------------------------------Estimates for T2 --- Joint univariate .9500 BONFERRONI confidence intervals

TIME

Parameter

Coeff. Std. Err.

t-Value

Sig. t Lower -95% CL- Upper

2.47487373

.37733

6.55891

.00000

1.71489

3.23485

Parameter

ETA Sq.

.48875

213

TREAT BY TIME

Parameter

Coeff. Std. Err.

t-Value

Sig. t Lower -95% CL- Upper

2 3 4 5

11.3137085 -.56568542 -5.8689863 .919238816

3.77330 2.92278 2.06672 1.19322

2.99836 -.19354 -2.83976 .77038

.00441 .84740

1.49527 -8.17101

21.13214 7.03964 -.49121 4.02410

.00675 -11.24676 .44510 -2.18562

Parameter

ETA Sq.
Time
Before Treatm ent After Treatment
20.0 0

Error Bars show 95.0% Cl of Mean

2 3 4 5

.16651 .00083 .15197 .01302


0.00 No T reatme nt P lace bo Sero xat (Paxi l) E ffexor Chee rup

Depression levels

15.0 0

10.0 0

5.00

Type of Treatment

-------------------------------------

The interaction term is also significant, F(4, 45) = 4.42, p < .01. This indicates that the change in depression over time is different in some treatments to others. We can make sense of this through an interaction graph, but we can also look at our contrasts. The key contrasts for this whole analysis are the parameter estimates for the interaction term (the

214

bit in the output underneath the heading TREAT BY TIME) because they take into account the effect of time and treatment: Parameter 2 is our first contrast (no treatment vs. the rest) and as you can see this is significant (p is below 0.05). This tells us that the change in depression levels in the notreatment group was significantly different to the average change in all other groups, t = 2.998, p < .01. As you can see in the graph, there is no change in depression in the

notreatment group, but in all other groups there is a fall in depression. Therefore, this contrast reflects the fact that there is no change in the notreatment group, but there is a decrease in depression levels in all other groups. Parameter 3 is our second contrast (placebo vs. Seroxat, Effexor and Cheerup) and this is very non-significant, t = .194, p = .85. This shows that the decrease in depression levels seen in the placebo group is comparable to the average decrease in depression levels seen in the Seroxat, Effexor and Cheerup conditions. In other words, the combined effect of the drugs on depression is no better than a placebo. Parameter 4 is our third contrast (Cheerup vs. Effexor and Seroxat) and again this is highly significant, t = 2.84, p < 0.01. This shows that the decrease in depression levels seen in the Cheerup group is significantly bigger than the decrease seen in the Effexor and Seroxat groups combined. Put another way, Cheerup has a significantly bigger effect than other established antidepressants. Parameter 5 is our last contrast (Seroxat vs. Effexor) and this is very nonsignificant, t = .77, p = 0.45. This tells us that the decrease in depression levels seen in the Seroxat group is comparable to the decrease in depression levels seen in the Effexor group. Put another way, Effexor and Seroxat seem to have similar effects on depression.

215

I hope to have shown in this example how to specify contrasts using syntax and how looking at these contrasts (especially for an interaction term) can be a very useful way to break down an interaction effect.

Oliver Twisted: Please Sir, can I have some more Simple Effects?

I want to impress my friends by doing a simple effects analysis by hand boasts Oliver. You dont really need to know how simple effects analyses are calculated to run them, Oliver, but seeing as you asked it is explained in the additional material available from the companion website.
Another useful thing to follow up interaction effects is to run contrasts for the interaction term. Like simple effects, this can be done only using syntax, and its a fairly involved process. However, if this sounds like something you might want to do then the additional material for this chapter contains an example that Ive prepared that walks you through specifying contrasts across an interaction.

Calculating Simple Effects


A simple main effect (usually called a simple effect) is just the effect of one variable at levels of another variable. In Chapter 12 we had an example in which wed measured the attractiveness of dates after no alcohol, 2 pints and 4 pints in both men and women.

216

Therefore, we have two independent variables: alcohol (none, 2 pints, 4 pints) and gender (male and female). One simple effects analysis we could do would be to look at the effect of gender (i.e. compare male and female scores) at the three levels of alcohol. Lets look how wed do this. Were partitioning the model sum of squares and we saw in Chapter 10 that we calculate model sums of squares using this equation:

SSM = nk x k x grand

For simple effects, we calculate the model sum of squares for the effect of gender at each level of alcohol. So, wed begin with when there was no alcohol, and calculate the model sum of squares. Thus, the grand mean becomes the mean for when there was no alcohol, and the group means are the means for men (when there was no alcohol) and women (when there was no alcohol). So, we group the data by the amount of alcohol drunk. Within each of these three groups, we calculate the overall mean and also the means of the male and female scores separately. These mean scores are all we really need. Pictorially, you can think of the data as displayed pictorially below. We can then apply the same equation for the model sum of squares that we used for the overall model sum of squares, but we use the grand mean of the noalcohol data (63.75) and the means of males (66.875) and females (60.625) within this group:

No Alcohol Female 65 70 60 Male 50 55 80

2 Pints Female 70 65 60 Male 45 60 85

4 Pints Female 55 65 70 Male 30 30 30

217

60 60 55 60 55 60.625

65 70 75 75 65 66.875

70 65 60 60 50 62.50

65 70 70 80 60 66.875

55 55 60 50 50 57.500

55 35 20 45 40 35.625

Mean None = 63.75

Mean 2 Pints = 64.6875

Mean 4 Pints = 46.5625

SSGender(No Alcohol) = nk x k x grand = 156.25

= 8(60.625 63.75)2 + 8(66.875 63.75)2

The degrees of freedom for this effect are calculated the same way as for any model sum of squares; that is, they are one less than the number of conditions being compared (k 1), which in this case when were comparing only two conditions will be 1. The next step is to do the same but for the 2 pint data. Now we use the grand mean of the 2 pint data (64.6875) and the means of males (66.875) and females (62.50) within this group. The equation, however, stays the same:

SSGender(No Alcohol) = nk x k x grand

= 8(62.50 64.6875)2 + 8(66.875 63.6875)2 = 76.56


The degrees of freedom are the same as in the previous simple effect, namely k 1, which is 1 for these data. The next step is to do the same but for the 4 pint data. Now we use the

218

grand mean of the 4 pint data (46.5625) and the means of females (57.500) and males (35.625) within this group. The equation, however, stays the same:

SSGender(No Alcohol) = nk x k x grand = 1914.06

= 8(57.50 46.5625)2 + 8(35.625 46.5625)2

Again, the degrees of freedom are 1 (because weve compared two groups). As with any ANOVA, we need to convert these sums of squares to mean squares by dividing by the degrees of freedom. However, because all of these sums of squares have 1 degree of freedom, the mean squares will be the same as the sum of squares because were dividing by 1. So, the final stage is to calculate an F-ratio for each simple effect. As ever, the Fratio is just the mean squares for the model divided by the residual mean squares. So, you might well ask, what do we use for the residual mean squares? When conducting simple effects we use the residual mean squares for the original ANOVA (the residual mean squares for the entire model). In doing so we are merely partitioning the model sums of squares and so keep control of the Type I error rate. For these data, the residual sum of squares was 83.036 (see section 10.2.6). Therefore, we get:

FGender(No Alcohol) = FGender(2 Prints) = FGender(4 Prints) =

MSGender(No Alcohol) MSR MSGender(2 Prints) MSR MSGender(4 Prints) MSR = =

156.25 = 1.88 83.036

76.56 = 0.92 83.036 1914.06 = 23.05 83.036

We can evaluate these Fvalues in the usual way (they will have 1 and 42 degrees of freedom for these data). However, for the 2 pint data we can be sure there is not a significant effect of gender because the F-ratio is less than 1.

219

Labcoat Lenis Real Research 12. Dont Forget Your Toothbrush?

Davey, G. C. L. et al. (2003). Journal of Behavior Therapy & Experimental Psychiatry, 34, 141 160.

We have all experienced that feeling after we have left the house of wondering whether we locked the door, or if we remembered to close the window, or if we remembered to remove the bodies from the fridge in case the police turn up. This behaviour is normal; however, people with obsessive compulsive disorder (OCD) tend to check things excessively. They might, for example, check whether they have locked the door so often that it takes them an hour to leave their house. It is a very debilitating problem. One theory of this checking behaviour in OCD suggests that it is caused by a combination of the mood you are in (positive or negative) interacting with the rules you use to decide when to stop a task (do you continue until you feel like stopping, or until you have done the task as best as you can?). Davey, Startup, Zara, MacDonald, and Field (2003) tested this hypothesis by inducing a negative, positive or no mood in different people and then asking them to imagine that they were going on holiday and to generate as many things as they could that they should check before they went away. Within each mood group, half of the participants were instructed to generate as many items as they could (known as an As many as can stop rule), whereas the remainder were asked to generate items for as long as they felt like continuing the task (known as a feel like continuing stop rule). The data are in the file Davey2003.sav. Davey et al. hypothesized that people in negative moods, using an as many as can stop

220

rule would generate more items than those using a feel like continuing stop rule. Conversely, people in a positive mood would generate more items when using a feel like continuing stop rule compared to an as many as can stop rule. Finally, in neutral moods, the stop rule used shouldnt affect the number of items generated. Draw an error bar chart of the data and then conduct the appropriate analysis to test Davey et al.s hypotheses. Answers are in the additional material for this website (or look at pages 148149 in the original article). To do an error bar chart for means that are independent (i.e. have come from different groups) we need to double-click on the clustered bar chart icon in the Chart Builder (see the book chapter). All we need to do is to drag our variables into the appropriate drop zones. Select Checks from the variable list and drag it into the variable list and drag it into drag it into ; select Mood from

; finally select the Stop_Rule variable and

. This will mean that lines representing males and females will be

displayed in different colours. Select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph.

221

The resulting graph should look like this:

222

To access the main dialog box for a general factorial ANOVA use the file path . First, select the dependent variable Checks from the variables list on the left-hand side of the dialog box and drag it to the space labelled dependent Variable. In the space labelled fixed Factor(s) we need to place any independent variables relevant to the analysis. Select Mood and Stop_Rule in the variables list (these variables can be selected simultaneously by holding down Ctrl while clicking on the variables) and drag them to the fixed Factor(s) box.

223

The resulting output can be interpreted as follows. First, Levenes test is significant indicating a problem with homogeneity of variance. If we compare the largest and smallest variances (smallest = 2.352 = 5.52; largest = 7.862 = 61.78) we find a ratio of 61.78/5.52 = 11. We have six variances, and N 1 = 9, and so the critical value from Hartleys table (which you can find in this document) is 7.80. Our observed value of 11 is bigger than this so we definitely have a problem.

224

The main effect of mood was not significant, F(2, 54) = 0.68, p > .05, indicating that the number of checks (when we ignore the stop rule adopted) was roughly the same regardless of whether the person was in a positive, negative or neutral mood. Similarly, the main effect of mood was not significant, F(1, 54) = 2.09, p > .05, indicating that the number of

225

checks (when we ignore the mood induced) was roughly the same regardless of whether the person used an as many as can or a feel like continuing stop rule. The mood stop rule interaction was significant, F(2, 54) = 6.35, p < .01, indicating that the mood combined with the stop rule to affect checking behaviour. Looking at the graph a negative mood in combination with an as many as can stop rule increased checking as did the combination of a feel like continuing stop rule and a positive mood, just as Davey et al. predicted.

Chapter 13

Self-Test Answers

What is a repeated-measures design?

Repeated-measures is a term used when the same participants participate in all conditions of an experiment.

What does contrast 3 (Level 3 vs. Level 4) compare?

Contrast 3 compares the fish eyeball with the witchetty grub.

Try rerunning these post hoc tests but select the uncorrected values (LSD) in the options dialog box (see section 13.8.5). You should find that the difference between beer and water is now significant (p = .02).
The dialog boxes should look like this:

226

The output from the post hoc tests for drink looks like this:

The difference between beer and water is now significant (p = .02).

Why do you think that this contradiction has occurred?

Its because the contrasts have more power to detect differences than post hoc tests.

Additional Material

227

Oliver Twisted: Please Sir, Can I Have Some More Sphericity?

Balls , says Oliver, are spherical, and I like balls. Maybe Ill like sphericity too if only you could explain it to me in more detail. Be careful what you wish for, Oliver. In my youth I wrote an article called A bluffers guide to sphericity, which I used to cite in this book, roughly on this page. A few people ask me for it, so I thought I might as well reproduce it in the additional material for this chapter.

Below is a reproduction of: Field, A. P. (1998). A bluffers guide to sphericity. Newsletter of the Mathematical, Statistical and Computing Section of the British Psychological Society, 6(1), 1322.
A bluffers guide to sphericity

The use of repeated measures, where the same subjects are tested under a number of conditions, has numerous practical and statistical benefits. For one thing it reduces the error variance caused by between-group individual differences, however, this reduction of error comes at a price because repeated measures designs potentially introduce covariation between experimental conditions (this is because the same people are used in each condition and so there is likely to be some consistency in their behaviour across conditions). In between-group ANOVA we have to assume that the groups we test are independent for the test to be accurate (Scariano & Davenport, 1987, have documented some of the consequences of violating this

228

assumption). As such, the relationship between treatments in a repeated measures design creates problems with the accuracy of the test statistic. The purpose of this article is to explain, as simply as possible, the issues that arise in analysing repeated measures data with ANOVA: specifically, what is sphericity and why is it important?

What is Sphericity?

Most of us are taught during our degrees that it is crucial to have homogeneity of variance between conditions when analysing data from different subjects, but often we are left to assume that this problem goes away in repeated measures designs. This is not so, and the assumption of sphericity can be likened to the assumption of homogeneity of variance in between-group ANOVA. Sphericity (denoted by and sometimes referred to as circularity) is a more general condition of compound symmetry. Imagine you had a population covariance matrix , where:

2 s11 12 2 21 s 22 = 31 32 ... ... n1 n 2

13 ... 1n 23 ... 2 n 2 ... 3 n s 33


...

n3

... ...

... 2 s nn

Equation 1

This matrix represents two things: (1) the off-diagonal elements represent the covariances between the treatments 1 ... n (you can think of this as the unstandardised correlation between
229

each of the repeated measures conditions); and (2) the diagonal elements signify the variances within each treatment. As such, the assumption of homogeneity of variance between treatments will hold when:
2 2 2 2 s11 s 22 s 33 ... s nn

Equation 2

(i.e. when the diagonal components of the matrix are approximately equal). This is comparable to the situation we would expect in a between-group design. However, in repeated measures designs there is the added complication that the experimental conditions covary with each other. The end result is that we have to consider the effect of these covariances when we analyse the data, and specifically we need to assume that all of the covariances are approximately equal (i.e. all of the conditions are related to each other to the same degree and so the effect of participating in one treatment level after another is also equal). Compound symmetry holds when there is a pattern of constant variances along the diagonal (i.e. homogeneity of variance see Equation 2) and constant covariances off of the diagonal (i.e. the covariances between treatments are equalsee Equation 3). While compound symmetry has been shown to be a sufficient condition for conduction ANOVA on repeated measures data, it is not a necessary condition.

12 13 23 ... 1n 2 n 3n ...
Equation 3

Sphericity is a less restrictive form of compound symmetry (in fact much of the early research into repeated measures ANOVA confused compound symmetry with sphericity). Sphericity
230

refers to the equality of variances of the differences between treatment levels. Whereas compound symmetry concerns the covariation between treatments, sphericity is related to the variance of the differences between treatments. So, if you were to take each pair of treatment levels, and calculate the differences between each pair of scores, then it is necessary that these differences have equal variances. Imagine a situation where there are 4 levels of a repeated measures treatment (A, B, C, D). For sphericity to hold, one condition must be satisfied:
2 2 2 2 2 2 sA B s A C s A D s B C s B D s C D

Equation 4

Sphericity is violated when the condition in Equation 4 is not met (i.e. the differences between pairs of conditions have unequal variances).

How is Sphericity Measured?

The simplest way to see whether or not the assumption of sphericity has been met is to calculate the differences between pairs of scores in all combinations of the treatment levels. Once this has been done, you can simply calculate the variance of these differences. E.g. Table 1 shows data from an experiment with 3 conditions (for simplicity there are only 5 scores per condition). The differences between pairs of conditions can then be calculated for each subject. The variance for each set of differences can then be calculated. We saw above that sphericity is met when these variances are roughly equal. For this data, sphericity will hold when:
2 2 2 sA B s A C s B C
Where:

231

2 sA B = 15.7 2 sA C = 10.3 2 sB C = 10.3

As such,
2 2 2 sA B s AC = s B C

Condition A 10 15 25 35 30

Condition B 12 15 30 30 27

Condition C 8 12 20 28 20 Variance:

A-B -2 0 -5 5 3 15.7

A-C 2 3 5 7 10 10.3

B-C 5 3 10 2 7 10.3

Table 1: Hypothetical data to illustrate the calculation of the variance of the differences

between conditions. So there is at least some deviation from sphericity because the variance of the differences between conditions A and B is greater than the variance of the differences between conditions A and C, and between B and C. However, we can say that this data has local circularity (or local sphericity) because two of the variances are identical). This means that for any multiple comparisons involving these differences, the sphericity assumption has been met (for a discussion of local circularity see Rouanet and Lpine, 1970). The deviation from sphericity in

232

the data in Table 1 does not seem too severe (all variances are roughly equal). This raises the issue of how we assess whether violations from sphericity are severe enough to warrant action.

Assessing the Severity of Departures from Sphericity

Luckily the advancement of computer packages makes it needless to ponder the details of how to assess departures from sphericity. SPSS produces a test known as Mauchlys test, which tests the hypothesis that the variances of the differences between conditions are equal. Therefore, if Mauchlys test statistic is significant (i.e. has a probability value less than 0.05) we must conclude that there are significant differences between the variance of differences, ergo the condition of sphericity has not been met. If, however, Mauchlys test statistic is nonsignificant (i.e. p > .05) then it is reasonable to conclude that the variances of differences are not significantly different (i.e. they are roughly equal). So, in short, if Mauchlys test is significant then we must be wary of the F-ratios produced by the computer.

Figure 1: Output of Mauchlys test from SPSS version 7.0

233

Figure 1 shows the result of Mauchlys test on some fictitious data with three conditions (A, B and C). The result of the test is highly significant indicating that the variance between the differences were significantly different. The table also displays the degrees of freedom (the df are simply N 1 , where N is the number of variances compared) and three estimates of sphericity (see section on correcting for sphericity).

What is the Effect of Violating the Assumption of Sphericity?

Rouanet and Lpine (1970) provided a detailed account of the validity of the F-ratio when the sphericity assumption does not hold. They argued that there are two different F-ratios that can be used to assess treatment comparisons. The two types of F-ratio were labelled F and F respectively. F refers to an F-ratio derived from the mean squares of the comparison in question and the interaction of the subjects with that comparison (i.e. the specific error term for each comparison is used this is the F-ratio normally used). F is derived not from the specific error mean square but from the total error mean squares for all of the repeated measures comparisons. Rouanet and Lpine (1970) argued that F is less powerful than F and so it may be the case that this test statistic misses genuine effects. In addition, they showed that for F to be valid the covariation matrix, , must obey local circularity (i.e. sphericity must hold for the specific comparison in question) and Mendoza, Toothaker & Crain (1976) have supported this by demonstrating that the F ratios of an L J K factorial design with two repeated measures are valid only if local circularity holds. F" requires only overall circularity (i.e. the whole data set must be circular) but because of the non-reciprocal nature of circularity and compound symmetry, F does not require compound symmetry whilst F' does. So, given that F is the statistic generally used, the effect of violating sphericity is a loss of power (compared to when
234

F is used) and an test statistic (F-ratio) which simply cannot be validity compared to tabulated values of the F-distribution.

Correcting for Violations of Sphericity

If data violates the sphericity assumption there are a number of corrections that can be applied to produce a valid F-ratio. SPSS produces three corrections based upon the estimates of sphericity advocated by Greenhouse and Geisser (1959) and Huynh and Feldt (1976). Both of these estimates give rise to a correction factor that is applied to the degrees of freedom used to asses the observed value of F. How each estimate is calculated is beyond the scope of this article, for our purposes all we need know is that each estimate differs slightly from the others. ) varies between (where k is the number The Greenhouse-Geisser estimate (usually denoted as is to 1.00, the more homogeneous are of repeated measures conditions) and 1. The closer that the variances of differences, and hence the closer the data are to being spherical. Figure 1 shows is 0.5, it is clear that the a situation with three conditions and hence the lower limit of is 0.503 which is very close to 0.5 and so represents a substantial calculated value of > 0.75 too many false deviation from sphericity. Huynh and Feldt (1976) reported that when null hypotheses fail to be rejected (i.e. the test is too conservative) and Collier, Baker, as high as 0.90. Huynh and Mandeville & Hayes (1967) showed that this was also true with

~ ). to make it less conservative (usually denoted as Feldt, therefore, proposed a correction to ~a ctually overestimates sphericity. Stevens However, Maxwell and Delaney (1990) report that
(1992) therefore recommends taking an average of the two and adjusting the df by this

>0.75 then the df should be corrected averaged value. Girden (1992) recommends that when

235

~ . If ~< s using 0.75, or nothing is known about sphericity at all, then the conservative hould
be used to adjust the df.

Figure 2: Output of epsilon corrected F values from SPSS version 7.0.

Figure 2 shows a typical ANOVA table for a set of data that violated sphericity (the same data used to generate Figure 1). The table in Figure 2 shows the F ratio and associated degrees of freedom when sphericity is assumed, as can be seen, this results in a significant F statistic indicating some difference(s) between the means of the three conditions. Underneath are the corrected values (for each of the three estimates of sphericity). Notice that in all cases the F ratios remain the same, it is the degrees of freedom that change (and hence the critical value of F). The degrees of freedom are corrected by the estimate of sphericity. How this is done can be seen in Table 2. The new degrees of freedom are then used to ascertain the critical value of F. For this data this results in the observed F being nonsignificant at p < 0.05. This particular data set illustrates how important it is to use a valid critical value of F, it can mean the difference between a statistically significant result and a nonsignificant result. More importantly, it can mean the difference between making a Type I error and not.

236

Estimate of Sphericity Used None

Value of Estimate

Term

df

Correction

New df

Effect Error 0.503 Effect Error 0.506 Effect Error

2 8 2 8 2 8

0.503 2 0.503 8 0.506 2 0.506 8

1.006 4.024 1.012 4.048

Table 2: Shows how the sphericity corrections are applied to the degrees of freedom.

Multivariate vs. Univariate Tests

A final option, when you have data that violates sphericity, is to use multivariate test statistics (MANOVA) because they are not dependent upon the assumption of sphericity (see OBrien & Kaiser, 1985). There is a trade off of test power between univariate and multivariate approaches although some authors argue that this can be overcome with suitable mastery of the techniques (OBrien and Kaisser, 1985). MANOVA avoids the assumption of sphericity (and all the corresponding considerations about appropriate F ratios and corrections) by using a specific error term for contrasts with 1 df and hence, each contrast is only ever associated with its specific error term (rather than the pooled error terms used in ANOVA). Davidson (1972) compared the power of adjusted univariate techniques with those of Hotellings T2 (a MANOVA test statistic) and found that the univariate technique was relatively powerless to detect small

237

reliable changes between highly correlated conditions when other less correlated conditions were also present. Mendoza, Toothaker and Nicewander (1974) conducted a Monte Carlo study comparing univariate and multivariate techniques under violations of compound symmetry and normality and found that as the degree of violation of compound symmetry increased, the empirical power for the multivariate tests also increased. In contrast, the power for the univariate tests generally decreased (p 174). Maxwell and Delaney (1990) noted that the univariate test is relatively more powerful than the multivariate test as n decreases and proposed that the multivariate approach should probably not be used if n is less than a + 10 (a is the number of levels for repeated measures) (p 602). As a general rule it seems that when you have a large violation of sphericity ( < 0.7) and your sample size is greater than (a + 10) then multivariate procedures are more powerful whilst with small sample sizes or when sphericity holds ( > 0.7) the univariate approach is preferred (Stevens, 1992). It is also worth noting that the power of MANOVA increases and decreases as a function of the correlations between dependent variables (Cole et al, 1994) and so the relationship between treatment conditions must be considered also.

Multiple Comparisons

So far, I have discussed the effects of sphericity on the omnibus ANOVA. As a final flurry some discussion of the effects on multiple comparison procedures is warranted. Boik (1981) provided an estimable account of the effects of nonsphericity on a priori tests in repeated measures designs, and concluded that even very small departures from sphericity produce large biases in the F-test and recommends against using these tests for repeated measures contrasts. When experimental error terms are small, the power to detect relatively strong effects can be as
238

low as .05 (when sphericity = .80). He argues that the situation for a priori comparisons cannot be improved and concludes by recommending a multivariate analogue. Mitzel and Games (1981) found that when sphericity does not hold ( < 1) the pooled error term conventionally employed in pairwise comparisons resulted in nonsignificant differences between two means declared significant (i.e. a lenient Type 1 error rate) or undetected differences (a conservative Type 1 error rate). They therefore recommended the use of separate error terms for each comparison. Maxwell (1980) systematically tested the power and alpha levels for 5 a priori tests under repeated measures conditions. The tests assessed were Tukeys Wholly Significant Difference (WSD) test which uses a pooled error term, Tukeys procedure but with a separate error term with either ( n 1 ) df [labelled SEP1] or

(n 1)(k 1)

df [labelled SEP2],

Bonferronis procedure (BON), and a multivariate approachthe Roy-Bose Simultaneous Confidence Interval (SCI). Maxwell tested these a priori procedures varying the sample size, number of levels of the repeated factor and departure from sphericity. He found that the multivariate approach was always "too conservative for practical use" (p 277) and this was most extreme when n (the number of subjects) is small relative to k (the number of conditions). Tukeys test inflated the alpha rate as the covariance matrix departs from sphericity and even when a separate error term was used (SEP1) alpha was slightly inflated as k increased whilst SEP2 also lead to unacceptably high alpha levels. The Bonferroni method, however, was extremely robust (although slightly conservative) and controlled alpha levels regardless of the manipulation. Therefore, in terms of Type I error rates the Bonferroni method was best. In terms of test power (the Type II error rate) for a small sample (n = 8) WSD was the most powerful under conditions of nonsphericity. This advantage was severely reduced when n = 15. Keselman and Keselman (1988) extended Maxwells work and also investigated unbalanced designs. They
239

too used Tukeys WSD, a modified WSD (with non-pooled error variance), Bonferroni t-statistics, and a multivariate approach, and looked at the same factors as Maxwell (with the addition of unequal samples). They found that when unweighted means were used (with unbalanced designs) none of the four tests could control the Type 1 error rate. When weighted means were used only the multivariate tests could limit alpha rates although Bonferroni t statistics were considerably better than the two Tukey methods. In terms of power they concluded that as the number of repeated treatment levels increases, BON is substantially more powerful than SCI (p 223). So, in terms of these studies, the Bonferroni method seems to be generally the most robust of the univariate techniques, especially in terms of power and control of the Type I error rate.

Conclusion

It is more often the rule than the exception that sphericity is violated in repeated measures designs. For this reason, all repeated measures designs should be exposed to tests of violations of sphericity. If sphericity is violated then the researcher must decide whether a multivariate or univariate analysis is preferred (with due consideration to the trade off between test validity on one hand and power on the other). If univariate methods are chosen then the omnibus ANOVA must be corrected appropriately depending on the level of departure from sphericity. Finally, if pairwise comparisons are required the Bonferroni method should probably be used to control the Type 1 error rate. Finally, to ensure that the group sizes are equal otherwise even the Bonferroni technique is subject to inflations of alpha levels.

240

References

Boik, R. J. (1981). A Priori tests in repeated measures designs: effects of nonsphericity, Psychometrika, 46 (3), 241-255. Cole, D. A., Maxwell, S. E., Arvey, R., & Salas, E. (1994). How he power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables, Psychological Bulletin, 115 (3), 465-474. Davidson, M.L.(1972) Univariate versus multivariate tests in repeated-measures experiments. Psychological Bulletin, 77 446452. Girden, E. R. (1992). ANOVA: Repeated Measures (Sage university paper series on qualitative applications in the social sciences, 84), Newbury Park, CA: Sage. Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24, 95112. Huynh, H., and Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in randomised block and split-plot designs. Journal of Educational Statistics, 1 (1), 69-82. Keselman, H. J. & Keselman, J. C. (1988). Repeated measures multiple comparison procedures: Effects of violating multisample sphericity in unbalanced designs. Journal of educational Statistics, 13 (3), 215-226. Maxwell, S. E. (1980). Pairwise multiple comparisons in repeated measures designs. Journal of Educational Statistics, 5 (3), 269-287. Maxwell, S. E. & Delaney (1990). Designing experiments and analyzing data. Belmont, CA: Wadsworth.

241

Davidson, M.L. (1972) Unvariate Versus Multivariate tests in repeated-Measures experiments. Psychological Bulletin, 77 446-452.

Mendoza, J. L., Toothaker, L. E. & Nicewander, W. A. (1974). A Monte Carlo comparison of the univariate and multivariate methods for the groups by trials repeated measures design. Multivariate Behavioural Research, 9, 165-177. Mendoza, J. L., Toothaker, L. E. & Crain, B. R. (1976). Necessary and sufficient conditions for F Ratios in the L x J x K Factorial design with two repeated factors. Journal of the American Statistical Association, 71, 992-993. Mitzel, H. C., & Games, P. A. (1981). Circularity and multiple comparisons in repeated measures designs, British Journal of Mathematical and Statistical Psychology, 34, 253-259. O Brien, M. G., & Kaiser, M. K. (1985). MANOVA method for analyzing repeated measures designs: An extensive primer, Psychological Bulletin, 97 (2), 316-333. Rouanet, H. & Lpine, D. (1970). Comparison between treatments in a repeated-measurement design: Anova and multivariate methods. The British Journal of mathematical and Statistical Psychology, 23, 147-163. Scariano, S. M. & Davenport, J. M. (1987). The effects of violations of independence in the oneway ANOVA. The American Statistician, 41 (2), 123129. Stevens, J. (1992). Applied multivariate statistics for the social sciences (2nd edition). Hillsdale, NJ: LEA.
Labcoat Lenis Real Research: Whos Afraid of the Big bad Wolf?

Field, A. P. (2006). Journal of Abnormal Psychology, 115(4), 742752.

242

Im going to let my ego get the better of me and talk about some of my own research. When Im not scaring my students with statistics, I scare small children with Australian marsupials. There is a good reason for doing this, which is to try to discover how children develop fears (which will help us to prevent them). Most of my research looks at the effect of giving children information about animals or situations that are novel to them (rather like a parent, teacher of TVshow would do). In one particular study (Field, 2006), we used three novel animals (the quoll, quokka and cuscus) and the children were told negative things about one of the animals, positive things about another, and given no information about the third (our control). I then asked the children to place their hands in three wooden boxes each of which they believed contained one of the aforementioned animals. My hypothesis was that they would take longer to place their hand in the box containing the animal about which they had heard negative information. The data from this part of the study are in the file Field(2006).sav. Labcoat Leni wants you to carry out a one-way repeated-measures ANOVA on the times taken to place their hand in the three boxes (negative information, positive information, no information). First, draw an error bar graph of the means, then do some normality tests on the data, then do a log transformation on the scores, and do the ANOVA on these log-transformed scores (if you read the paper youll notice that I found that the data were not normal so I log transformed them before doing the ANOVA). Do children take longer to put their hand in a box that they believe contains an animal about which they have been told nasty things?

You really ought to know how to do an error bar graph by now, so all I will say is that it should look something like this:

243

To get the normality tests use the Explore procedure:

244

The resulting KS tests show that the data are very heavily non-normal. If you look at the QQ and PP plots (not reproduced here but theyll be in your output) you will see that the data are very heavily skewed. This will be, in part, because if a child didnt put their hand in the box after 15 seconds we gave them a score of 15 and asked them to move on to the next box (this was for ethical reasons: if a child hadnt put their hand in the box after 15 s we assumed that they did not want to do the task). To log-transform the scores we need to use the compute function:

We need to do this three times (once for each variable). Alternatively we could use the following syntax:

245

COMPUTE LogNegative=ln(bhvneg). COMPUTE LogPositive=ln(bhvpos). COMPUTE LogNoInformation=ln(bhvnone). EXECUTE. To do the ANOVA we have to define a variable called Information_Type and then specify the three logged variables:

You can specify some simple contrasts (comparing everything to the last category (no information) or post hoc tests. I actually did something slightly different because I wanted to get precise Bonferronicorrected confidence intervals for my post hoc comparisons, but if you ask for some post hoc tests you will get the same profile of results that I did.

246

Note first of all that the sphericity test is significant. Therefore, in the paper I reported GreenhouseGeisser corrected degrees of freedom and significance. The main ANOVA shows that the type of information significantly affected how long the children took to place their hands in the boxes. The post hoc tests and the graph tell us that a child took longer to place their hand in the box that they believed contained an animal about which they had heard bad things compared to the boxes that they believed contained animals that they had heard positive information about or no information. There was not a significant difference between the approach times for the positive information and no information boxes. You could report these results as follows:

247

The latencies to approach the boxes were positively skewed (KolmogorovSmirnov zs = 1.89, 2.82, and 3.09 for the threat, positive and no information boxes respectively) and so were transformed using the natural log of the score. The resulting distributions were not significantly different from normal (KolmogorovSmirnov zs = 0.77, 1.04 and 1.17 for the threat, positive and no information boxes respectively). A one-way repeated-measures ANOVA revealed a significant main effect of the type of boxi, F(1.90, 239.52) = 104.69, p < .001. Bonferroni corrected post hoc tests revealed a significant difference between the threat information box and the positive information box, p < .001; the threat information box and the no information box, p < .001; but not the positive information box and the no information box, p > .05.

Chapter 15

Self-Test Answers

Carry out some analyses to test for normality and homogeneity of variance in these data. To get the outputs in the book use the following dialog boxes:

248

Split the file by Drug.

To split the file by drug you need to select

and complete the dialog box as follows:

249

See whether you can enter the data in Table 15.3 into SPSS (you dont need to enter the ranks). Then conduct some exploratory analysis on the data. To get the outputs in the book use the following dialog boxes:

Use the Chart Builder to draw a boxplot of these data

The completed Chart Builder window should look like this:

250

Carry out the three MannWhitney Tests suggested above.

The simplest way to run these tests is to use the MannWhitney dialog box, but each time click on and select a different comparison each time (the three tests we want to do compare each group against the control so they all include group 1 as the first group; all that changes is the value in Group 2, which reflects which group is being compared to the controls.

251

252

Using what you know about inputting data, try to enter these data into SPSS and run some exploratory analyses. To get the outputs in the book use the following dialog boxes:

Carry out the three Wilcoxon tests suggested above.

You can do the Wilcoxon tests by selecting the pairs of variables for each comparison in turn and transferring them across to the box labelled Test Pair(s) List: To run the analysis, select the Wilcoxon test dialog box by selecting

. Once the dialog box is activated, select the first two variables from the list (click on Start with the mouse and then, while holding down the Ctrl key, click on Month1). Transfer this pair to the box labelled Test Pairs by clicking on . Then select

253

Start and Month2 and transfer them by clicking on

. Finally, select Month1 and Month2 and .

transfer them by clicking on

. To run the analysis, return to the main dialog box and click on

Additional Material Oliver Twisted: Please Sir, can I have some more Jonck?

I want to know how the JonckheereTerpstra Test actually works,? complains Oliver. Of course you do, Oliver, sleep is hard to come by these days. I am only too happy to oblige my little syphilitic friend. The additional material for this chapter on the companion website has a complete explanation of the test and how it works. I bet youre glad you asked.

254

Jonckheeres test is based on the simple, but elegant, idea of taking a score in a particular condition and counting how many scores in subsequent conditions are smaller than that score. So, the first step is to order your groups in the way that you expect your medians to change. If we take the soya example from Chapter 13, then we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week. So, we start with the no meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 1 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to decrease.2 So, starting with the first score in the no soya meals group (this score is 0.35), we look at the scores in the next condition (1 soya meal) and count how many are greater than 0.35. It turns out that all 19 of the 20 scores are greater than this value, so we place the value of 19 in the appropriate column and move on to the next score (0.58) and do the same. When weve done this for all of the scores in the no meals group, we go back to the first score (0.35) again, but this time count how many scores are bigger in the next but one condition (the 4 soya meals condition). It turns out that 18 scores are bigger so we register this in the appropriate column and move on to the next score (0.58) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups. At this stage we move on to the next group (the 1 soya meal). Again, we start with the first score (0.33) and count how many scores are bigger than this value in the subsequent group (the 4 meals

In fact, we can order the groups the opposite way around if we want to, so we can start with the group we predict to have the lowest median, and then order them in the order we expect the medians to increase. All this will do is reverse the sign of the resulting z-score, and if youre keen to know why theres a section at the end of this document that shows what happens when we reverse the order of groups!

255

group). In this case there all 20 scores are bigger than 0.33, so we register this in the table and move on to the next score (0.36). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.33) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared with all subsequent categories in this way, we simply add up the counts as I have done Table1. These sums of counts are denoted by Uij.

256

Table 1: Data to show Jonckheeres test for the soya example 7 No Soya Meals 1 Soya Meal 4 Soya Meals Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 1 Meal 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 19 18 18 15 15 15 15 14 10 9 9 4 Meals 20 19 18 18 15 14 14 14 11 11 6 7 4 Meals Meals 18 16 13 13 12 7 7 7 6 4 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 20 20 18 18 18 14 14 13 12 11 11 18 18 16 16 15 7 7 7 7 6 5 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 18 16 13 12 11 9 7 7 6 3 3 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 7 Meals 7 Meals Scores Are Sperm

257

4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32 21.08


Total

8 8 6 4 4 2 2 2 0

5 5 3 3 3 3 3 2 0

0 0 0 0 0 0 0 0 0

4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 18.47

6 4 3 3 3 3 3 1 0

0 0 0 0 0 0 0 0 0

3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18.21

3 2 1 0 0 0 0 0 0

1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 4.11

193 (Uij)

187

104

195

122

111

The test statistic, J, is simply the sum of these counts:


J=

U
i< j

ij

which for these data is simply:

258

J=

U
i< j

ij

= 193 + 187 + 104 + 195 + 122 + 111

= 912

For small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:
N2

J=

j =

4 1 N 2 (2 N + 3 ) 72

2 k

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size:

259

j =
= =

1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae:
z= x x J J 912 1200 = = = 2.476 s j 116.33

This z can then be evaluated using the critical values in the Appendix of the book. This test is always one-tailed because we have predicted a trend to use the test. So were looking at z being above 1.65 (when ignoring the sign) to be significant. In fact, the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z) as they do in this example! Does it Matter how I order My Groups? I have just showed how to use the test when the groups are ordered by of descending medians (i.e. we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week; so we ordered the groups: no soya, 1 meal, 4 meals and 7 meals). Certain books will tell you to order the groups in ascending order (i.e. start with the group that you expect to have the lowest median). For the soya data this would mean arranging the groups in the opposite order to how I did in the Appendix; that is, 7 meals, 4 meals, 1 meal and no meals. The purpose of this section is to show you what happens if we order the groups the opposite way around!

260

The process is similar to that used in the Appendix, only now we start with start with the 7 meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 2 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to increase. So, starting with the first score in the 7 soya meals group (this score is 0.31), we look at the scores in the next condition (4 soya meals) and count how many are greater than 0.31. It turns out that all 20 scores are greater than this value, so we place the value of 20 in the appropriate column and move on to the next score (0.32) and do the same. When weve done this for all of the scores in the 7 meals group, we go back to the first score (0.31) again, but this time count how many scores are bigger in the next but one condition (the 1 soya meal condition). It turns out that all 20 scores are bigger so we register this in the appropriate column and move on to the next score (0.32) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups.
Table 2: Data to show Jonckheeres test for the soya example in Chapter 13

No 7 Soya Meals 4 Soya Meals 1 Soya Meal Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 4 1 No 1 Meal No Meals No Meals Scores Are Sperm

261

Meals 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 20 20 19 19 18 18 18 17 16 15 15 14 14 12 11 11 11 8 7

Meal 20 20 18 18 16 15 15 15 15 15 15 15 15 11 10 9 9 9 9

Meals 20 20 19 19 18 18 18 16 15 15 15 15 15 12 11 11 10 10 10 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18 18 15 15 15 15 13 12 11 9 9 9 9 9 8 7 7 2 2 19 18 16 16 15 15 12 12 12 10 10 10 10 10 9 7 7 2 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 20 19 18 18 18 13 12 12 12 11 11 9 7 7 6 6 4 4 1 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32

262

4.11
Total

18.21

18.47

21.08

289 (Uij)

278

296

204

212

209

At this stage we move on to the next group (the 4 soya meals). Again, we start with the first score (0.40) and count how many scores are bigger than this value in the subsequent group (the 1 meal group). In this case there are 18 scores bigger than 0.40, so we register this in the table and move on to the next score (0.60). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.40) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared to all subsequent categories in this way, we simply add up the counts as I have done in the table. These sums of counts are denoted by Uij. As before, test statistic J is simply the sum of these counts:

J=

U
i< j

ij

which for these data is simply:


J=

U
i< j

ij

= 289 + 278 + 296 + 204 + 212 + 209

= 1488

As I said earlier, for small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:

263

J=

N2

n
4

2 k

j =

1 N 2 (2 N + 3 ) 72

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as we did earlier:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size (again this is the same as earlier):
j =
= = 1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae. The mean and standard deviation are the same as above, but we now have a different test statistic (it is 1491 rather than 912). So, lets see what happens when we plug this new test statistic into the equation:

264

z=

x x J J 1488 1200 = = = 2.476 116.33 j s

Note that the zscore is the same value as when we ordered the groups in descending order, except that it now has a positive value rather than a negative one! This goes to prove what I wrote earlier: the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z)! Earlier we ordered the groups in descending order and so got a negative z, and now we ordered them in ascending order and so got a positive z. Labcoat Leni's Real Research: Having a Quail of a Time? Matthews, N. et al. (2007). Psychological Science, 18(9), 758-762.

We encountered some research in Chapter 2 in which we discovered that you can influence aspects of male quails sperm production through conditioning. The basic idea is that the male is granted access to a female for copulation in a certain chamber (e.g. one that is coloured green) but gains no access to a female in a different context (e.g. a chamber with a tilted floor). The male, therefore, learns that when he is in the green chamber his luck is in, but if the floor is tilted then frustration awaits. For other males the chambers will be reversed (i.e. they get sex only when in the chamber with the tilted floor). The human equivalent (well, sort of) would be if you always managed to pull in the Zap Club but never in the Honey Club. During the test phase, males get to mate in both chambers. The question is: after the males have learnt that they will get a mating opportunity in a certain context, do they produce more sperm or better-quality sperm when mating in that context compared to the control context. (Are you more of a stud in the Zap Club ? Ok, Im going to

265

stop this anaology now.) Mike Domjan and his colleagues predicted that if conditioning evolved because it increases reproductive fitness then males that mated in the context that had previously signalled a mating opportunity would fertilize a significantly greater number of eggs than quails that mated in their control context (Matthews, Domjan, Ramsey, & Crews, 2007). They put this hypothesis to the test in an experiment that is utter genius. After training, they allowed 14 females to copulate with two males (counterbalanced): one male copulated with the female in the chamber that had previously signalled a reproductive opportunity (Signalled), whereas the second male copulated with the same female but in the chamber that had not previously signalled a mating opportunity (Control). Eggs were collected from the females for 10 days after the mating and a genetic analysis was used to determine the father of any fertilized eggs. The data from this study are in the file Mathews et al. (2007).sav. Labcoat Leni wants you to carry out a Wilcoxon signedrank test to see whether more eggs were fertilized by males mating in their signalled context compared to males in their control context.

To

run

the

analysis,

select

the

Wilcoxon

test

dialog

box

by

selecting

. Once the dialog box is activated, select two variables from the list (click on the first variable with the mouse and then, while holding down the Ctrl key, the second). You can also select the variables one at a time and transfer them: for example, you could select Signalled and drag it to the column labelled Variable 1 in the box labelled Test Pairs (or click on ), and then select Control and drag it to the column labelled Variable 2 (or click on

). Each pair appears as a new row in the box labelled Test Pairs. To run the analysis, return to the main dialog box and click on .

266

The first table provides information about the ranked scores. It tells us the number of negative ranks (these are females that produced more eggs fertilized by the male in his signalled chamber than the male in his control chamber) and the number of positive ranks (these are females that produced less eggs fertilized by the male in his signalled chamber than the male in his control chamber). The table shows that for 10 of the 14 quails, the number of eggs fertilized by the male in his signalled chamber was greater than for the male in his control chamber, indicating an adaptive benefit to

267

learning that a chamber signalled reproductive opportunity. There was one tied rank (i.e. one female that produced an equal number of fertilized eggs for both males). The table also shows the average number of negative and positive ranks and the sum of positive and negative ranks. Below the table are footnotes, which tell us to what the positive and negative ranks relate (so provide the same kind of explanation as Ive just madesee, Im not clever, I just read the footnotes!). The test statistic, T, is the lowest value of the two types of ranks, so our test value here is the sum of positive ranks (e.g. 13.50). This value can be converted to a z-score and this is what SPSS does. The second table tells us that the test statistic is based on the positive ranks, that the z-score is 2.30 and that this value is significant at p = .022. Therefore, we should conclude that there were a greater number of fertilized eggs from males mating in their signalled context, z = 2.30, p < .05. In other words, conditioning (as a learning mechanism) provides some adaptive benefit in that it makes it more likely that you will pass on your genes. The authors concluded as follows: Of the 78 eggs laid by the test females, 39 eggs were fertilized. Genetic analysis indicated that 28 of these (72%) were fertilized by the signalled males, and 11 were fertilized by the control males. Ten of the 14 females in the experiment produced more eggs fertilized by the signalled male than by the control male (see Fig. 1; Wilcoxon signed-ranks test, T = 13.5, p < .05). These effects were independent of the order in which the 2 males copulated with the female. Of the 39 fertilized eggs, 20 were sired by the 1st male and 19 were sired by the 2nd male. The present findings show that when 2 males copulated with the same female in succession, the male that received a Pavlovian CS signalling copulatory opportunity fertilized more of the females eggs. Thus, Pavlovian conditioning increased reproductive fitness in the context of sperm competition. (p. 760).

268

Labcoat Lenis Real Research: Eggs-traordinary!

etinkaya, H., & Domjan, M. (2006). Journal of Comparative Psychology, 120(4), 427432.

There seems to be a lot of sperm in this book (not literally I hope)its possible that I have a mild obsession. We saw that male quail fertilized more eggs if they had been be able to predict when a mating opportunity would arise. However, some quail develop fetishes. Really. In the previous example the type of compartment acted as a predictor of an opportunity to mate, but in studies where a terrycloth object acts as a sign that a mate will shortly become available, some quails start to direct their sexuial behaviour towards the terrycloth object. (I may regret this anology but in human terms if you imagine that everytime you were going to have sex with your boyfriend you gave him a green towel a few moments before seducing him, then after enough seductions he would start rubbing his crotch against any green towel he saw. If youve ever wondered why you boyfriend does this, then hopefully this is an enlightening explanation.) In evolutionary terms, this fetishistic behaviour seems counterproductive because sexual behaviour become directed towards something that cannot provide reproductive success. However, perhaps this behaviour serves to prepare the organism for the real mating behaviour. Hakan etinkaya and Mike Domjan conducted a brilliant study in which they sexually conditioned male quail (etinkaya & Domjan, 2006). All quail experienced the terrycloth stimulus and an opportunity to mate, but for some the terrycloth stimulus immediately preceded the mating opportunity (paired group) whereas for others they experienced it 2 hours after the mating opportunity (this was the control group because the terrycloth stimulus did not preedict a mating

269

opportuinity). In the paired group, quail were classified as fetishistic or not depending on whether they engaged in sexual behaviour with the terrycloth object. During a test trial the quails mated with a female and the researcher measured the percentage of eggs fertilized, the time spent near the terrycloth object, the latency to initiate copulation, and copulatory efficiency. If this fetishistic behaviour provides an evolutionary advantage then we would expect the fetishistic quails to fertilize more eggs, initiate copulation faster, and be more efficient in their copulations. The data from this study are in the file etinkaya & Domjan (2006).sav. Labcoat Leni wants you to carry out a KruskalWallis test to see whether fetishist quails produced a higher percentage of fertilized eggs and initiated sex more quickly.

Lets begin by using the Chart Builder (

) to do some boxplots:

270

First, access the main dialog box by selecting

. Once

the dialog box is activated, select the two dependent variables from the list (click on Egg_Percent and, while holding down the Ctrl key, Latency) and drag them to the box labelled Test Variable List (or click on ). Next, select the independent variable (the grouping variable), in this case

Group, and drag it to the box labelled Grouping Variable. When the grouping variable has been

selected the

button becomes active and you should click on it to activate the define range

dialog box. SPSS needs to know the range of numeric codes you assigned to your groups, and there is a space for you to type the minimum and maximum codes. The minimum code we used was 1, and the maximum was 3, so type these numbers into the appropriate spaces. When you have defined the groups, click on dialog box and click on to return to the main dialog box. To run the analyses return to the main .

271

The output should look like this:

For both variables there is a significant effect. So there are differences between the groups but we dont know where these differences lie. To find out we can conduct several MannWhitney tests. To access these select .

272

The output you should get is: Fetishistic vs. Nonfetishistic: Fetishistic vs. Control:

Nonfetishistic vs. Control:

273

The authors reported as follows: KruskalWallis analysis of variance (ANOVA) confirmed that female quail partnered with the different types of male quail produced different percentages of fertilized eggs, 2 (2, N = 59) =11.95, p < .05, 2 = 0.20. Subsequent pairwise comparisons with the MannWhitney U test (with the Bonferroni correction) indicated that fetishistic male quail yielded higher rates of fertilization than both the nonfetishistic male quail (U = 56.00, N1 = 17, N2 = 15, effect size = 8.98, p < .05) and the control male quail (U = 100.00, N1 = 17, N2 = 27, effect size = 12.42, p < .05). However, the nonfetishistic group was not significantly different from the control group (U = 176.50, N1 = 15, N2 = 27, effect size = 2.69, p > .05). (page 249) For the latency data they reported as follows: A KruskalWallis analysis indicated significant group differences, 2 (2, N = 59) = 32.24, p < .05,

2 = 0.56. Pairwise comparisons with the MannWhitney U test (with the Bonferroni correction)
showed that the nonfetishistic males had significantly shorter copulatory latencies than both the fetishistic male quail (U = 0.00, N1 = 17, N2 = 15, effect size = 16.00, p < .05) and the control male quail (U = 12.00, N1 = 15, N2 = 27, effect size = 19.76, p < .05). However, the fetishistic group was not significantly different from the control group (U = 161.00, N1 = 17, N2 = 27, effect size = 6.57, p > .05). (page 430)

274

These results support the authors theory that fetishist behaviour may have evolved because it offers some adaptive function (such as preparing for the real thing).

Chapter 14

Self-Test Answers

What is the difference between a main effect and an interaction?

A main effect is the unique effect of a predictor variable (or independent variable) on an outcome variable. In this context it can be the effect of gender, charisma or looks on their own. So, in the case of gender, the main effect is the difference in the average score from men (irrespective of the type of date they were rating) to that of all women (irrespective of the type of date that they are rating). The main effect of looks would be the mean rating given to all attractive dates (irrespective of their charisma, or whether they were rated by a man or a woman), compared to the average rating given to all averagelooking dates (irrespective of their charisma, or whether they were rated by a man or a woman) and the average rating of all ugly dates (irrespective of their charisma, or whether they were rated by a man or a woman). An interaction on the other hand looks at the combined effect of two or more variables: for example, were the average ratings of attractive, ugly and averagelooking dates different in men and women?

Additional Material

275

Labcoat Lenis Real Research: Keep the Faith(ful)?

Schtzwohl, A. (2008). Personality and Individual Differences, 44, 633644.

People can be jealous. People can be especially jealous when they think that their partner is being unfaithful. An evolutionary view of jealousy suggests that men and women have evolved distinctive types of jealousy because male and female reproductive success is threatened by different types of infidelity. Specifically, a womans sexual infidelity deprives her mate of a reproductive opportunity and in some cases burdens him with years investing in a child that is not his. Conversely, a mans sexual infidelity does not burden his mate with unrelated children, but may divert his resources from his mates progeny. This diversion of resources is signalled by emotional attachment to another female. Consequently, mens jealousy mechanism should have evolved to prevent a mates sexual infidelity, whereas in women it has evolved to prevent emotional infidelity. If this is the case then men and women should divert their attentional resources towards different cues to infidelity: women should be on the lookout for emotional infidelity, whereas men should be watching out for sexual infidelity. Achim Schtzwohl put this theory to the test in a unique study in which men and women saw sentences presented on a computer screen (Schtzwohl, 2008). On each trial, participants saw a target sentence that was always affectively neutral (e.g. The gas station is at the other side of the street). However, the trick was that before each of these targets, a distractor sentence was presented that could also be affectively neutral, or could indicate sexual infidelity (e.g. Your partner suddenly has difficulty becoming sexually aroused when he and you want to have sex) or emotional

276

infidelity (e.g. Your partner doesnt say I love you to you anymore). The idea was that if these distractor sentences grabbed a persons attention then (1) they would remember them, and (2) they would not remember the target sentence that came afterwards (because their attentional resources were still focused on the distractor). These effects should only show up in people currently in a relationship. The outcome was the number of sentences that a participant could remember (out of 6), and the predictors were whether the person had a partner or not (Relationship), whether the trial used a neutral distractor, an emotional infidelity distractor or a sexual infidelity distractor, and whether the sentence was a distractor, or the target following the distractor. They analysed men and womens data seperately. The predictions are that women should remember more emotional infidelity sentences (distractors) but fewer of the targets that followed those sentences (target). For men, the same effect should be found but for sexual infidelity sentences.
The data from this study are in the file Schtzwohl(2008).sav. Labcoat Leni wants you to carry out two three-way mixed ANOVAs (one for men and the other for women) to test these hypotheses. Answers are in the additional material on the companies website (or look at pages 638642 in the original article).

We want to run these analyses on men and women separately; therefore, we could (to be efficient) split the file by the variable Gender (see Chapter 5):

277

To (

run

the

ANOVA

select

the ).

repeated-measures We have two

ANOVA

dialog

box

repeated-measures

variables:

whether the sentence was a distracter or a target (lets call this Sentence_Type) and whether the distracter used on a trial was neutral, indicated sexual infidelity or emotional infidelity (lets call this variable Distracter_Type). The resulting ANOVA will be a 2 (relationship: with partner or not) 2 (sentence type: distracter or target) 3 (distracter type: neutral, emotional infidelity or sexual infidelity) three-way mixed ANOVA with repeated measures on the last two variables. First we must define our two repeated-measures variables:

278

Next we need to define these variables by specifying the columns in the data editor that relate to the different combinations of the type of sentence and the type of trial. As you can see, we specified Sentence_Type first, therefore we have all of the variables relating to distracters specified before those for targets. For each type of sentence there are three different variants depending on whether the distracter used was neutral, emotional or sexual. Note that we have use the same order for both types of sentence (neutral, emotional, sexual) and that we have put neutral distracters as the first category so that we can look at some contrasts (neutral distracters are the control).

279

To do some contrasts select the first category:

and select some simple contrasts comparing everything to

You could also ask for an interaction graph for the three-way interaction:

280

You can set other options as in the book chapter. Lets look at the mens output first. Sphericity tests are fine (all non-significant) so Ive simplified the main ANOVA table to show only the sphericity assumed tests:

281

We could report these effects as follows: A three-way ANOVA with current relationship status as the between-subject factor and mens recall of sentence type (targets vs. distractrs) and distractr type (neutral, emotional infidelity and sexual infidelity) as the within-subjects factors yielded a significant main effect of sentence type, F(1, 37) = 53.97, p < .001, and a significant interaction between current relationship status and distracter content, F(2, 74) = 3.92, p = .024. More important, the three-way interaction was also significant, F(2, 74) = 3.79, p = .027. The remaining main effects and interactions were not significant, Fs < 2, ps > .17.

To pick apart the three-way interaction we can look at the table of contrasts:

This table tells us that the effect of whether or not you are in a relationship and whether you were remembering a distracter or target was similar in trials in which an emotional-infidelity distracter was used compared to when a neutral distracter was used, F(1, 37) < 1, p = .95 (level 2 vs. level 1 in the table). However, as predicted, there is a difference in trials in which a sexual-infidelity distracter was used compared to those in which a neutral distracter was used, F(1, 37) = 5.39, p < .05 (level 3 vs. level 1).

282

To see what these contrasts tell us look at the graphs (Ive edited these a bit so that they are clearer). First off, in those without partners, they remember many more targets than they do distracters, and this is true for all types of trials. In other words, it doesnt matter whether the distracter is neutral, emotional or sexual; these people remember more targets than distracters. The same pattern is seen in those with partners except for distracters that indicate sexual infidelity (the red line). For these, the number of targets remembered is reduced. Put another way, the slope of the green and blue lines is more or less the same for those in and out of relationships (compare graphs) and also to each other (compare green with blue). The only difference is for the red line, which is comparable to the green and blue lines for those not in relationships, but is much shallower for those in relationships. They remember fewer targets that were preceded by a sexual-infidelity distracter. This supports the predictions of the author: men in relationships have an attentional bias such that their attention is consumed by cues indicative of sexual infidelity. Lets now look at the womens output. Sphericity tests are fine (all non-significant) so Ive simplified the main ANOVA table to show only the sphericity assumed tests:

283

We could report these effects as follows: A three-way ANOVA with current relationship status as the between-subject factor and mens recall of sentence type (targets vs. distracters) and distracter type (neutral, emotional infidelity and sexual infidelity) as the within-subject factors yielded a significant main effect of sentence type, F(1, 39) = 39.68, p < .001, and distracter type, F(2, 78) = 4.24, p = .018. Additionally, significant interactions were found between sentence type and distracter type, F(2, 78) = 4.63, p = .013, and most important sentence type distracter type relationship, F(2, 78) = 5.33, p = .007. The remaining main effect and interactions were not significant, Fs < 1.2, ps > .29. To pick apart the three-way interaction we can look at the table of contrasts:

284

This table tells us that the effect of whether or not you are in a relationship and whether you were remembering a distracter or target was significantly different in trials in which a emotional-infidelity distracter was used compared to when a neutral distracter was used, F(1, 39) = 7.56, p = .009 (level 2 vs. level 1 in the table). However, there was not a significant difference in trials in which a sexual-infidelity distracter was used compared to those in which a neutral distracter was used, F(1, 39) = 0.31, p = .58 (level 3 vs. level 1).

To see what these contrasts tell us look at the graphs (Ive edited these a bit so that they are clearer). As for the men, women without partners remember many more targets than they do distracters, and this is true for all types of trials (although its less true for the sexual-infidelity

285

trials because this line has a shallower slope). The same pattern is seen in those with partners except for distracters that indicate emotional infidelity (the green line). For these, the number of targets remembered is reduced. Put another way, the slope of the red and blue lines is more or less the same for those in and out of relationships (compare graphs). The only difference is for the green line, which is much shallower for those in relationships. They remember fewer targets that were preceded by a emotional-infidelity distracter. This supports the predictions of the author: women in relationships have an attentional bias such that their attention is consumed by cues indicative of emotional infidelity.

Chapter 15

Self-Test Answers

Carry out some analyses to test for normality and homogeneity of variance in these data. To get the outputs in the book use the following dialog boxes:

286

Split the file by Drug.

To split the file by drug you need to select

and complete the dialog box as follows:

See whether you can enter the data in Table 15.3 into SPSS (you dont need to enter the ranks). Then conduct some exploratory analysis on the data. To get the outputs in the book use the following dialog boxes:

287

Use the Chart Builder to draw a boxplot of these data

The completed Chart Builder window should look like this:

288

Carry out the three MannWhitney Tests suggested above.

The simplest way to run these tests is to use the MannWhitney dialog box, but each time click on and select a different comparison each time (the three tests we want to do compare each group against the control so they all include group 1 as the first group; all that changes is the value in Group 2, which reflects which group is being compared to the controls.

289

290

Using what you know about inputting data, try to enter these data into SPSS and run some exploratory analyses. To get the outputs in the book use the following dialog boxes:

Carry out the three Wilcoxon tests suggested above.

You can do the Wilcoxon tests by selecting the pairs of variables for each comparison in turn and transferring them across to the box labelled Test Pair(s) List: To run the analysis, select the Wilcoxon test dialog box by selecting

. Once the dialog box is activated, select the first two variables from the list (click on Start with the mouse and then, while holding down the Ctrl key, click on Month1). Transfer this pair to the box labelled Test Pairs by clicking on . Then select

291

Start and Month2 and transfer them by clicking on

. Finally, select Month1 and Month2 and .

transfer them by clicking on

. To run the analysis, return to the main dialog box and click on

Additional Material Oliver Twisted: Please Sir, can I have some more Jonck?

I want to know how the JonckheereTerpstra Test actually works,? complains Oliver. Of course you do, Oliver, sleep is hard to come by these days. I am only too happy to oblige my little syphilitic friend. The additional material for this chapter on the companion website has a complete explanation of the test and how it works. I bet youre glad you asked.

292

Jonckheeres test is based on the simple, but elegant, idea of taking a score in a particular condition and counting how many scores in subsequent conditions are smaller than that score. So, the first step is to order your groups in the way that you expect your medians to change. If we take the soya example from Chapter 13, then we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week. So, we start with the no meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 1 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to decrease.3 So, starting with the first score in the no soya meals group (this score is 0.35), we look at the scores in the next condition (1 soya meal) and count how many are greater than 0.35. It turns out that all 19 of the 20 scores are greater than this value, so we place the value of 19 in the appropriate column and move on to the next score (0.58) and do the same. When weve done this for all of the scores in the no meals group, we go back to the first score (0.35) again, but this time count how many scores are bigger in the next but one condition (the 4 soya meals condition). It turns out that 18 scores are bigger so we register this in the appropriate column and move on to the next score (0.58) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups. At this stage we move on to the next group (the 1 soya meal). Again, we start with the first score (0.33) and count how many scores are bigger than this value in the subsequent group (the 4 meals

In fact, we can order the groups the opposite way around if we want to, so we can start with the group we predict to have the lowest median, and then order them in the order we expect the medians to increase. All this will do is reverse the sign of the resulting z-score, and if youre keen to know why theres a section at the end of this document that shows what happens when we reverse the order of groups!

293

group). In this case there all 20 scores are bigger than 0.33, so we register this in the table and move on to the next score (0.36). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.33) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared with all subsequent categories in this way, we simply add up the counts as I have done Table1. These sums of counts are denoted by Uij.

294

Table 1: Data to show Jonckheeres test for the soya example 7 No Soya Meals 1 Soya Meal 4 Soya Meals Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 1 Meal 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 19 18 18 15 15 15 15 14 10 9 9 4 Meals 20 19 18 18 15 14 14 14 11 11 6 7 4 Meals Meals 18 16 13 13 12 7 7 7 6 4 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 20 20 18 18 18 14 14 13 12 11 11 18 18 16 16 15 7 7 7 7 6 5 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 18 16 13 12 11 9 7 7 6 3 3 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 7 Meals 7 Meals Scores Are Sperm

295

4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32 21.08


Total

8 8 6 4 4 2 2 2 0

5 5 3 3 3 3 3 2 0

0 0 0 0 0 0 0 0 0

4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 18.47

6 4 3 3 3 3 3 1 0

0 0 0 0 0 0 0 0 0

3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18.21

3 2 1 0 0 0 0 0 0

1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 4.11

193 (Uij)

187

104

195

122

111

The test statistic, J, is simply the sum of these counts:


J=

U
i< j

ij

which for these data is simply:

296

J=

U
i< j

ij

= 193 + 187 + 104 + 195 + 122 + 111

= 912

For small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:
N2

J=

j =

4 1 N 2 (2 N + 3 ) 72

2 k

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size:

297

j =
= =

1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae:
z= x x J J 912 1200 = = = 2.476 s j 116.33

This z can then be evaluated using the critical values in the Appendix of the book. This test is always one-tailed because we have predicted a trend to use the test. So were looking at z being above 1.65 (when ignoring the sign) to be significant. In fact, the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z) as they do in this example! Does it Matter how I order My Groups? I have just showed how to use the test when the groups are ordered by of descending medians (i.e. we expect sperm counts to be highest in the no soya meals condition, and then decrease in the following order: 1 meal per week, 4 meals per week, 7 meals per week; so we ordered the groups: no soya, 1 meal, 4 meals and 7 meals). Certain books will tell you to order the groups in ascending order (i.e. start with the group that you expect to have the lowest median). For the soya data this would mean arranging the groups in the opposite order to how I did in the Appendix; that is, 7 meals, 4 meals, 1 meal and no meals. The purpose of this section is to show you what happens if we order the groups the opposite way around!

298

The process is similar to that used in the Appendix, only now we start with start with the 7 meals per week condition, and we take the first score and ask How many scores in the next condition are bigger than this score? Youll find that this is easy to work out if you arrange your data in ascending order in each condition. Table 2 shows a convenient way to lay out the data. Note that the sperm counts have been ordered in ascending order and the groups have been ordered in the way that we expect our medians to increase. So, starting with the first score in the 7 soya meals group (this score is 0.31), we look at the scores in the next condition (4 soya meals) and count how many are greater than 0.31. It turns out that all 20 scores are greater than this value, so we place the value of 20 in the appropriate column and move on to the next score (0.32) and do the same. When weve done this for all of the scores in the 7 meals group, we go back to the first score (0.31) again, but this time count how many scores are bigger in the next but one condition (the 1 soya meal condition). It turns out that all 20 scores are bigger so we register this in the appropriate column and move on to the next score (0.32) and do the same until weve done all of the scores in the 7 meals group. We basically repeat this process until weve compared the first group to all subsequent groups.
Table 2: Data to show Jonckheeres test for the soya example in Chapter 13

No 7 Soya Meals 4 Soya Meals 1 Soya Meal Soya Meals How Many How Many Scores Are Sperm Bigger in the Sperm Are Bigger in the How Many Scores Sperm Bigger in the 4 1 No 1 Meal No Meals No Meals Scores Are Sperm

299

Meals 0.31 0.32 0.56 0.57 0.71 0.81 0.87 1.18 1.25 1.33 1.34 1.49 1.50 2.09 2.70 2.75 2.83 3.07 3.28 20 20 19 19 18 18 18 17 16 15 15 14 14 12 11 11 11 8 7

Meal 20 20 18 18 16 15 15 15 15 15 15 15 15 11 10 9 9 9 9

Meals 20 20 19 19 18 18 18 16 15 15 15 15 15 12 11 11 10 10 10 0.40 0.60 0.96 1.20 1.31 1.35 1.68 1.83 2.10 2.93 2.96 3.00 3.09 3.36 4.34 5.81 5.94 10.16 10.98 18 18 15 15 15 15 13 12 11 9 9 9 9 9 8 7 7 2 2 19 18 16 16 15 15 12 12 12 10 10 10 10 10 9 7 7 2 1 0.33 0.36 0.63 0.64 0.77 1.53 1.62 1.71 1.94 2.48 2.71 4.12 5.65 6.76 7.08 7.26 7.92 8.04 12.10 20 19 18 18 18 13 12 12 12 11 11 9 7 7 6 6 4 4 1 0.35 0.58 0.88 0.92 1.22 1.51 1.52 1.57 2.43 2.79 3.40 4.52 4.72 6.90 7.58 7.78 9.62 10.05 10.32

300

4.11
Total

18.21

18.47

21.08

289 (Uij)

278

296

204

212

209

At this stage we move on to the next group (the 4 soya meals). Again, we start with the first score (0.40) and count how many scores are bigger than this value in the subsequent group (the 1 meal group). In this case there are 18 scores bigger than 0.40, so we register this in the table and move on to the next score (0.60). Again, we repeat this for all scores and then go back to the beginning of this group (i.e. back to the first score of 0.40) and repeat the process until this category has been compared to all subsequent categories. When all categories have been compared to all subsequent categories in this way, we simply add up the counts as I have done in the table. These sums of counts are denoted by Uij. As before, test statistic J is simply the sum of these counts:

J=

U
i< j

ij

which for these data is simply:


J=

U
i< j

ij

= 289 + 278 + 296 + 204 + 212 + 209

= 1488

As I said earlier, for small samples there are specific tables to look up critical values of J; however, when samples are large (anything bigger than about 8 per group would be large in this case) the sampling distribution of J becomes normal with a mean and standard deviation of:

301

J=

N2

n
4

2 k

j =

1 N 2 (2 N + 3 ) 72

n (2n
2 k

+ 3)

in which N is simply the total sample size (in this case 80) and nk is simply the sample size of group k (in each case in this example this will be 20 because we have equal sample sizes). So, we just square each groups sample size and add them up, then subtract this value from the total sample size squared. We then divide the result by 4. Therefore, we can calculate the mean for these data as we did earlier:
J= 80 2 20 2 + 20 2 + 20 2 + 20 2 4 4800 = 4 = 1200

The standard deviation can similarly be calculated using the sample sizes of each group and the total sample size (again this is the same as earlier):
j =
= = 1 80 2 ((2 80) + 3) 20 2 ((20 2) + 3) + 20 2 ((20 2) + 3) + 202 ((20 2) + 3) + +20 2 ((20 2) + 3) 72 1 {6400(163) [400(43) + 400(43) + 400(43) + 400(43)]} 72 1 {104320068800} 72

= 13533.33 = 116.33

We can use the mean and standard deviation to convert J to a z-score (see Chapter 1) using the standard formulae. The mean and standard deviation are the same as above, but we now have a different test statistic (it is 1491 rather than 912). So, lets see what happens when we plug this new test statistic into the equation:

302

z=

x x J J 1488 1200 = = = 2.476 116.33 j s

Note that the zscore is the same value as when we ordered the groups in descending order, except that it now has a positive value rather than a negative one! This goes to prove what I wrote earlier: the sign of the test tells us whether the medians ascend across the groups (a positive z) or descend across the groups (a negative z)! Earlier we ordered the groups in descending order and so got a negative z, and now we ordered them in ascending order and so got a positive z. Labcoat Leni's Real Research: Having a Quail of a Time? Matthews, N. et al. (2007). Psychological Science, 18(9), 758-762.

We encountered some research in Chapter 2 in which we discovered that you can influence aspects of male quails sperm production through conditioning. The basic idea is that the male is granted access to a female for copulation in a certain chamber (e.g. one that is coloured green) but gains no access to a female in a different context (e.g. a chamber with a tilted floor). The male, therefore, learns that when he is in the green chamber his luck is in, but if the floor is tilted then frustration awaits. For other males the chambers will be reversed (i.e. they get sex only when in the chamber with the tilted floor). The human equivalent (well, sort of) would be if you always managed to pull in the Zap Club but never in the Honey Club. During the test phase, males get to mate in both chambers. The question is: after the males have learnt that they will get a mating opportunity in a certain context, do they produce more sperm or better-quality sperm when mating in that context compared to the control context. (Are you more of a stud in the Zap Club ? Ok, Im going to

303

stop this anaology now.) Mike Domjan and his colleagues predicted that if conditioning evolved because it increases reproductive fitness then males that mated in the context that had previously signalled a mating opportunity would fertilize a significantly greater number of eggs than quails that mated in their control context (Matthews, Domjan, Ramsey, & Crews, 2007). They put this hypothesis to the test in an experiment that is utter genius. After training, they allowed 14 females to copulate with two males (counterbalanced): one male copulated with the female in the chamber that had previously signalled a reproductive opportunity (Signalled), whereas the second male copulated with the same female but in the chamber that had not previously signalled a mating opportunity (Control). Eggs were collected from the females for 10 days after the mating and a genetic analysis was used to determine the father of any fertilized eggs. The data from this study are in the file Mathews et al. (2007).sav. Labcoat Leni wants you to carry out a Wilcoxon signedrank test to see whether more eggs were fertilized by males mating in their signalled context compared to males in their control context.

To

run

the

analysis,

select

the

Wilcoxon

test

dialog

box

by

selecting

. Once the dialog box is activated, select two variables from the list (click on the first variable with the mouse and then, while holding down the Ctrl key, the second). You can also select the variables one at a time and transfer them: for example, you could select Signalled and drag it to the column labelled Variable 1 in the box labelled Test Pairs (or click on ), and then select Control and drag it to the column labelled Variable 2 (or click on

). Each pair appears as a new row in the box labelled Test Pairs. To run the analysis, return to the main dialog box and click on .

304

The first table provides information about the ranked scores. It tells us the number of negative ranks (these are females that produced more eggs fertilized by the male in his signalled chamber than the male in his control chamber) and the number of positive ranks (these are females that produced less eggs fertilized by the male in his signalled chamber than the male in his control chamber). The table shows that for 10 of the 14 quails, the number of eggs fertilized by the male in his signalled chamber was greater than for the male in his control chamber, indicating an adaptive benefit to

305

learning that a chamber signalled reproductive opportunity. There was one tied rank (i.e. one female that produced an equal number of fertilized eggs for both males). The table also shows the average number of negative and positive ranks and the sum of positive and negative ranks. Below the table are footnotes, which tell us to what the positive and negative ranks relate (so provide the same kind of explanation as Ive just madesee, Im not clever, I just read the footnotes!). The test statistic, T, is the lowest value of the two types of ranks, so our test value here is the sum of positive ranks (e.g. 13.50). This value can be converted to a z-score and this is what SPSS does. The second table tells us that the test statistic is based on the positive ranks, that the z-score is 2.30 and that this value is significant at p = .022. Therefore, we should conclude that there were a greater number of fertilized eggs from males mating in their signalled context, z = 2.30, p < .05. In other words, conditioning (as a learning mechanism) provides some adaptive benefit in that it makes it more likely that you will pass on your genes. The authors concluded as follows: Of the 78 eggs laid by the test females, 39 eggs were fertilized. Genetic analysis indicated that 28 of these (72%) were fertilized by the signalled males, and 11 were fertilized by the control males. Ten of the 14 females in the experiment produced more eggs fertilized by the signalled male than by the control male (see Fig. 1; Wilcoxon signed-ranks test, T = 13.5, p < .05). These effects were independent of the order in which the 2 males copulated with the female. Of the 39 fertilized eggs, 20 were sired by the 1st male and 19 were sired by the 2nd male. The present findings show that when 2 males copulated with the same female in succession, the male that received a Pavlovian CS signalling copulatory opportunity fertilized more of the females eggs. Thus, Pavlovian conditioning increased reproductive fitness in the context of sperm competition. (p. 760).

306

Labcoat Lenis Real Research: Eggs-traordinary!

etinkaya, H., & Domjan, M. (2006). Journal of Comparative Psychology, 120(4), 427432.

There seems to be a lot of sperm in this book (not literally I hope)its possible that I have a mild obsession. We saw that male quail fertilized more eggs if they had been be able to predict when a mating opportunity would arise. However, some quail develop fetishes. Really. In the previous example the type of compartment acted as a predictor of an opportunity to mate, but in studies where a terrycloth object acts as a sign that a mate will shortly become available, some quails start to direct their sexuial behaviour towards the terrycloth object. (I may regret this anology but in human terms if you imagine that everytime you were going to have sex with your boyfriend you gave him a green towel a few moments before seducing him, then after enough seductions he would start rubbing his crotch against any green towel he saw. If youve ever wondered why you boyfriend does this, then hopefully this is an enlightening explanation.) In evolutionary terms, this fetishistic behaviour seems counterproductive because sexual behaviour become directed towards something that cannot provide reproductive success. However, perhaps this behaviour serves to prepare the organism for the real mating behaviour. Hakan etinkaya and Mike Domjan conducted a brilliant study in which they sexually conditioned male quail (etinkaya & Domjan, 2006). All quail experienced the terrycloth stimulus and an opportunity to mate, but for some the terrycloth stimulus immediately preceded the mating opportunity (paired group) whereas for others they experienced it 2 hours after the mating opportunity (this was the control group because the terrycloth stimulus did not preedict a mating

307

opportuinity). In the paired group, quail were classified as fetishistic or not depending on whether they engaged in sexual behaviour with the terrycloth object. During a test trial the quails mated with a female and the researcher measured the percentage of eggs fertilized, the time spent near the terrycloth object, the latency to initiate copulation, and copulatory efficiency. If this fetishistic behaviour provides an evolutionary advantage then we would expect the fetishistic quails to fertilize more eggs, initiate copulation faster, and be more efficient in their copulations. The data from this study are in the file etinkaya & Domjan (2006).sav. Labcoat Leni wants you to carry out a KruskalWallis test to see whether fetishist quails produced a higher percentage of fertilized eggs and initiated sex more quickly.

Lets begin by using the Chart Builder (

) to do some boxplots:

308

First, access the main dialog box by selecting

. Once

the dialog box is activated, select the two dependent variables from the list (click on Egg_Percent and, while holding down the Ctrl key, Latency) and drag them to the box labelled Test Variable List (or click on ). Next, select the independent variable (the grouping variable), in this case

Group, and drag it to the box labelled Grouping Variable. When the grouping variable has been

selected the

button becomes active and you should click on it to activate the define range

dialog box. SPSS needs to know the range of numeric codes you assigned to your groups, and there is a space for you to type the minimum and maximum codes. The minimum code we used was 1, and the maximum was 3, so type these numbers into the appropriate spaces. When you have defined the groups, click on dialog box and click on to return to the main dialog box. To run the analyses return to the main .

309

The output should look like this:

For both variables there is a significant effect. So there are differences between the groups but we dont know where these differences lie. To find out we can conduct several MannWhitney tests. To access these select .

310

The output you should get is: Fetishistic vs. Nonfetishistic: Fetishistic vs. Control:

Nonfetishistic vs. Control:

311

The authors reported as follows: KruskalWallis analysis of variance (ANOVA) confirmed that female quail partnered with the different types of male quail produced different percentages of fertilized eggs, 2 (2, N = 59) =11.95, p < .05, 2 = 0.20. Subsequent pairwise comparisons with the MannWhitney U test (with the Bonferroni correction) indicated that fetishistic male quail yielded higher rates of fertilization than both the nonfetishistic male quail (U = 56.00, N1 = 17, N2 = 15, effect size = 8.98, p < .05) and the control male quail (U = 100.00, N1 = 17, N2 = 27, effect size = 12.42, p < .05). However, the nonfetishistic group was not significantly different from the control group (U = 176.50, N1 = 15, N2 = 27, effect size = 2.69, p > .05). (page 249) For the latency data they reported as follows: A KruskalWallis analysis indicated significant group differences, 2 (2, N = 59) = 32.24, p < .05,

2 = 0.56. Pairwise comparisons with the MannWhitney U test (with the Bonferroni correction)
showed that the nonfetishistic males had significantly shorter copulatory latencies than both the fetishistic male quail (U = 0.00, N1 = 17, N2 = 15, effect size = 16.00, p < .05) and the control male quail (U = 12.00, N1 = 15, N2 = 27, effect size = 19.76, p < .05). However, the fetishistic group was not significantly different from the control group (U = 161.00, N1 = 17, N2 = 27, effect size = 6.57, p > .05). (page 430)

312

These results support the authors theory that fetishist behaviour may have evolved because it offers some adaptive function (such as preparing for the real thing).
Chapter 16

Additional Material Oliver Twisted: Please Sir, can I have some more Maths?

You are a bit stupid. I think it would be fun to check your maths so that we can see exactly how much of a village idiot you are, mocks Oliver. Luckily you can. Never one to shy from public humiliation on a mass scale I have provided the matrix calculations for this example on the companion website. Find a mistake, go on, you know that you can

Calculation of E1

313

51 13 E = 13 122 determinant of E ,|E | = (51 122) (13 13) = 6053 122 13 matrix of minors for E = 13 51 + pattern of signs for 2 2 matrix = + 122 13 matrix of cofactors = 13 51 The inverse of a matrix is obtained by dividing the matrix of cofactors for E by E , the determinant of E. 122 6053 E 1 = 13 6053 Calculation of HE1 13 6053 0.0202 0.0021 = 51 0.0021 0.0084 6053

10.47 7.53 0.0202 0.0202 HE 1 = 0.0202 0.0084 7.53 19.47 [(10.47 0.0202) + (7.53 0.0021)] [(10.47 0.0021) + (7.53 0.0084)] = [(7.53 0.0202) + (19.47 0.0021)] [(7.53 0.0021) + (19.47 0.0084)] 0.2273 0.0852 = 0.1930 0.1794 Calculation of Eigenvalues The eigenvalues or roots of any square matrix are the solutions to the determinantal equation |A

I| = 0, in which A is the square matrix in question and I is an identity matrix of the same size as A.
The number of eigenvalues will equal the number of rows (or columns) of the square matrix. In this case the square matrix of interest is HE1.

314

0.2273 0.0852 0 HE 1 I = 0 0.1930 0.1794 0.0852 (0.2273 ) = (0.1794 ) 0.1930 = [(0.2273 )(0.1794 ) (0.1930 0.0852)] = 2 0.2273 0.1794 + 0.0407 0.0164 = 2 0.4067 + 0.0243 Therefore the equation |HE1 I| = 0 can be expressed as:

2 0.4067 + 0.0243 = 0
To solve the roots of any quadratic equation of the general form a2 + b + c = 0 we can apply the following formula:

i =

b 2 4 ac

2a

For the quadratic equation obtained, a = 1, b = 0.4067, c = 0.0243. If we replace these values into the formula for discovering roots, we get:
b

i =
=

(b

4 ac

)
2

2a 0.4067

2 = 0.4067 0.2612 2 . 0 6679 = or 0.1455 2 2 = 0.334 or 0.073

[( 0.4067 )

0.0972

Hence, the eigenvalues are 0.334 and 0.073.

Labcoat Lenis Real Research: A Lot of Hot Air!

315

Marzillier, S. L., & Davey, G. C. L. (2005). Cognition and Emotion, 19, 729750.

Have you ever wondered what researchers do in their spare time? Well, some of them spend it tracking down the sounds of people burping and farting! It has long been established that anxiety and disgust are linked. Anxious people are, typically, easily disgusted. Throughout this book I have talked about how you cannot infer causality from relationships between variables. This has been a bit of a conundrum for anxiety researchers: does anxiety cause feelings of digust or does a low threshold for being disgusted cause anxiety? Two colleagues of mine at Sussex addressed this in an unusual study in which they induced feelings of anxiety, feelings of disgust, or a neutral mood was induced, and they looked at the effect that these induced moods had on feelings of anxiety, sadness, happiness, anger, disgust and contempt. To induce these moods, they used three different types of manipulation: vignettes (e.g. youre swimming in a dark lake and something brushes your leg for anxiety, and you go into a public toilet and find it has not been flushed. The bowl of the toilet is full of diarrhoea for disgust), music (e.g. some scary music for anxiety, and a tape of burps, farts and vomitting for disgust), videos (e.g. a clip from Silence of the lambs for anxiety and a scene from Pink flamingos in which Divine eats dog faeces), and memory (remembering events from the past that had made the person anxious, disgusted or neutral). Different people underwent anxious, disgust and neutral mood inductions. Within these groups, the induction was done using vignettes and music, videos, or memory recall and music for different people. The outcome variables were the change (from before to after the induction) in six moods: anxiety, sadness, happiness, anger, disgust and contempt.

316

The data are in the file Marzillier and Davey (2005).sav. Draw an error bar graph of the changes in moods in the different conditions, then conduct a 3 (Mood: anxiety, disgust, neutral) 3 (Induction: vignettes + music, videos, memory recall + music) MANOVA on these data. Whatever you do, dont imagine what their fart tape sounded like while you do the analysis! Answers are in the additional material on the companion website (or look at page 738 of the original article).

To do the graph we have to access the Chart Builder and select a clustered bar chart. First, lets set Mood induction as the x-axis by selecting it and dragging it to the drop zone:

317

Next, select all of the DVs (click on the Change in Anxiety, then hold Shift down and click on Change in Contempt and all six should become highlighted). Then drag these into the y-axis drop zone. This will have the effect that different moods will be displayed by differentcoloured bars.

We have another variable, the type of induction, and we can display this too. First, click on the Groups/Point ID tab and then select Row Panel. When this is selected a new drop zone appears (called panel), and you can drag the Type of Induction into that zone. Remember to select and the finished dialog box will look as follows. Click on to produce the graph.

318

The completed graph will look like that below. This shows that the neutral mood induction (regardless of the way in which it was induced) didnt really affect mood too much (the changes are all quite small). For the disgust mood induction, disgust always increased quite a lot (the yellow bars) regardless of how disgust was induced. Similarly, the anxiety induction raised anxiety (predominantly). Happiness decreased for both anxiety and disgust mood inductions.

319

To run the MANOVA, the main dialog box should look like this:

320

You can set whatever options you like based on the chapter. The main multivariate statistics are shown below. A main effect of mood was found F(12, 334) = 21.91, p < .001, showing that the changes for some mood inductions were bigger than for others overall (looking at the graph this finding probably reflects that the disgust mood induction had the greatest effect overall mainly because it produced such huge changes in disgust). There was no significant main effect of the type of mood induction F(24, 334) = 1.12, p > .05, showing that whether videos, memory, tapes, etc., were used did not affect the changes in mood. Also, the type of mood type of induction interaction, F(24, 676) = 1.22, p > .05, showed that the type of induction did not influence the main effect of mood. In other words, the fact that the disgust induction seemed to have the biggest effect on mood (overall) was not influenced by how disgust was induced.

The univariate effects for type of mood (which was the only significant multivariate effect) show that the effect of the type of mood induction was significant for all six moods (in other words, for all six moods there were significant differences across the anxiety, disgust and neutral conditions). Below is a graph that collapses across the way that mood was induced (video, music, etc.) because

321

this effect was not significant (you can create this by going back to the Chart Builder and deselecting Rows Panel). We should do more tests, but just looking at the graph shows that changes in anxiety (blue bars) are higher over the three mood conditions (they go up after the anxiety induction, stay the same for the disgust induction, and go down for the neutral induction). Similarly, for disgust, the change is biggest after the disgust induction, it increases a little after the anxiety induction and doesnt really change after the neutral (yellow bars). Finally, for happiness, this goes down after both anxiety and disgust inductions, but doesnt change for neutral.

322

323

Chapter 17

Self-Test Answers

Use the Case Summaries command to list the factor scores for these data (given that there are over 2500 cases, you might like to restrict the output to the first 10 or 20).

To list the factor scores you need to use the Case Summaries command, which can be found by selecting . Simply select the variables that you want to

list (in this case the four columns of factor scores) and transfer them to the box labelled Variables

324

by dragging them or clicking on

. By default, SPSS will limit the output to the first 100 cases, but

lets set this to 10 so we just look at the first few cases (as in the book chapter).

Self-Test Answers

Using what you learnt in Chapter 5, use the compute command to reverse score item 3. (Clue: remember that you are simply changing the variable to 6 minus its original value.)

To access the compute dialog box, select

. We came across this command in

Chapter 5, and what we do is enter the name of the variable we want to change in the space labelled Target Variable (in this case the variable is called Question_03). You can use a different name if you like, but if you do SPSS will create a new variable and you must remember that its this new variable that you need to use in the reliability analysis. Then, where it says Numeric Expression you need to tell SPSS how to compute the new variable. In this case, we want to take each persons original score on item 3, and subtract that value from 6. Therefore, we simply type 6Question_03

325

(which means 6 minus the value found in the column labelled Question_03). If youve used the same name then when you click on existing variable; just click on youll get a dialog box asking if you want to change the

if youre happy for the new values to replace the old ones.

326

Additional Material Oliver Twisted: Please Sir, can I have some more Matrix Algebra?

The matrix enthuses Oliver, that was a good film. I want to dress in black and glide through the air as though time has stood still. Maybe the matrix of factor scores is as cool as the film. I think you might be disappointed Oliver, but well give it a shot. The matrix calculations of factor scores are detailed in the additional material for this chapter on the companion website. Be afraid, be very afraid

Calculation of Factor Score Coefficients


B = R 1 A 3.91 2.35 2.42 0.49 0.87 0.01 4.76 7.46 7.46 18.49 12.42 5.45 5.54 1.22 0.96 0.03 3.91 12.42 10.07 3.65 3.79 0.96 0.92 0.04 B= 2.35 5.45 3.65 2.97 2.16 0.02 0.00 0.82 2.42 5.54 3.79 2.16 2.98 0.56 0.10 0.75 0.49 1.22 0.96 0.02 0.56 1.27 0.09 0.70

Column 1 of matrix B

327

To get the first element of the first column of matrix B, you need to multiply each element in the first column of matrix A with the correspondingly placed element in the first row of matrix R1. Add these six products together to get the final value of the first element. To get the second element of the first column of matrix B, you need to multiply each element in the first column of matrix A with the correspondingly placed element in the second row of matrix R1. Add these six products together to get the final value and so on:
B11 = (4.75924 0.87407 ) + (7.46190 0.95768) + (3.90949 0.92138)

+ ( 2.35093 0.00237 ) + (2.42104 0.09575) + ( 0.48607 0.096 ) = 0.343

B12 = ( 7.4619 0.87407 ) + (18.48556 0.95768) + ( 12.41679 0.92138) + (5.445 0.00237 ) + ( 5.54427 0.09575) + (1.22155 0.096 ) = 0.376 B13 = (3.90949 0.87407 ) + ( 12.41679 0.95768) + (10.07382 0.92138)

+ ( 3.64853 0.00237 ) + (3.78869 0.09575) + ( 0.95731 0.096 ) = 0.362

B14 = ( 2.35093 0.87407 ) + (5.445 0.95768) + ( 3.64853 0.92138) = 0.000

+ (2.96922 0.00237 ) + ( 2.16094 0.09575) + (0.02255 0.096 )

B15 = (2.42104 0.87407 ) + ( 5.54427 0.95768) + (3.78869 0.92138) + ( 2.16094 0.00237 ) + (2.97983 0.09575) + ( 0.56017 0.096 ) = 0.037 B16 = ( 0.48607 0.87407 ) + (1.22155 0.95768) + ( 0.95731 0.92138) = 0.039

+ (0.02255 0.00237 ) + ( 0.56017 0.09575) + (1.27072 0.096 )

Column 2 of matrix B To get the first element of the second column of matrix B, you need to multiply each element in the second column of matrix A with the correspondingly placed element in the first row of matrix R1. Add these six products together to get the final value. To get the second element of the second column of matrix B, you need to multiply each element in the second column of matrix A with the

328

correspondingly placed element in the second row of matrix R1. Add these six products together to get the final value and so on:
B11 = (4.75924 0.00842 ) + (7.46190 0.03653) + (3.90949 0.03178)

+ ( 2.35093 0.81556 ) + (2.42104 0.75435) + ( 0.48607 0.69936 ) = 0.006

B12 = ( 7.4619 0.00842 ) + (18.48556 0.03653) + ( 12.41679 0.03178) + (5.445 0.81556 ) + ( 5.54427 0.75435) + (1.22155 0.69936 ) = 0.020 B13 = (3.90949 0.00842 ) + ( 12.41679 0.03653) + (10.07382 0.03178) + ( 3.64853 0.81556 ) + (3.78869 0.75435) + ( 0.95731 0.69936 ) = 0.020

B14 = ( 2.35093 0.00842 ) + (5.445 0.03653) + ( 3.64853 0.03178) = 0.473

+ (2.96922 0.81556 ) + ( 2.16094 0.75435) + (0.02255 0.69936 )

B15 = (2.42104 0.00842 ) + ( 5.54427 0.03653) + (3.78869 0.03178) + ( 2.16094 0.81556 ) + (2.97983 0.75435) + ( 0.56017 0.69936 ) = 0.437 B16 = ( 0.48607 0.00842 ) + (1.22155 0.03653) + ( 0.95731 0.03178) + (0.02255 0.81556 ) + ( 0.56017 0.75435) + (1.27072 0.69936 ) = 0.405

Oliver Twisted: Please Sir, can I have some more Questionnaires?

Im going to design a questionnaire to measure ones propensity to pick a pocket or two, says Oliver, but how would I go about doing it? Youd read the useful information about the dos and donts of questionnaire design in the additional material for this chapter on the companion website, thats how. Rate how useful it is on a Likert scale from 1 = not useful at all, to 5 = very useful.

329

What Makes a Good Questionnaire?

As a rule of thumb, never to attempt to design a questionnaire! A questionnaire is very easy to design, but a good questionnaire is virtually impossible to design. The point is that it takes a long time to construct a questionnaire with no guarantees that the end result will be of any use to anyone. A good questionnaire must have three things: Discrimination Validity Reliability

Discrimination Before talking about validity and reliability, we should talk about discrimination, which is really an issue of item selection. Discrimination simply means that people with different scores on a questionnaire should differ in the construct of interest to you. For example, a questionnaire measuring social phobia should discriminate between people with social phobia and people without it (i.e. people in the different groups should score differently). There are three corollaries to consider: 1. People with the same score should be equal to each other along the measured construct. 2. People with different scores should be different to each other along the measured construct. 3. The degree of difference between people is proportional to the difference in scores.

330

This is all pretty self-evident really so whats the fuss about? Well, lets take a really simple example of a three-item questionnaire measuring sociability. Imagine we administered this questionnaire to two people: Jane and Katie. Their responses are shown in Figure 1.
Jane Katie

Yes No

Yes No

1. I like going to parties 2. I often go to the pub 3 I really enjoy meeting people

1. I like going to parties 2. I often go to the pub 3. I really enjoy meeting people

Figure 1

Jane responded yes to items 1 and 3 but no to item 2. If we score a yes with the value 1 and a no with a 0, then we can calculate a total score of 2. Katie on the other hand answers yes to items 1 and 2 but no to item 3. Using the same scoring system her score is also 2. Therefore, numerically you have identical answers (i.e. both Jane and Katie score 2 on this questionnaire); therefore, these two people should be comparable in their sociability are they? The answer is: not necessarily. It seems that Katie likes to go to parties and the pub but doesnt enjoy meeting people in general, whereas Jane enjoys parties and meeting people but doesnt enjoy the pub. It seems that Katie likes social situations involving alcohol (e.g. the pub and parties) but Jane likes socializing in general, but cant tolerate cigarette smoke. In many ways, therefore, these people are very different because our questions are contaminated by other factors (i.e. attitudes to alcohol or smoky environments). A good questionnaire should be designed such that people with

331

identical numerical scores are identical in the construct being measured and thats not as easy to achieve as you might think! A second related point is score differences. Imagine you take scores on the Spider Phobia Questionnaire. Imagine you have three participants who do the questionnaire and get the following scores: Andy: 30 Difference = 15 Graham: 15 Dan: 10 Andy scores 30 on the SPQ (very spider phobic), Graham scores 15 (moderately phobic) and Dan scores 10 (not very phobic at all). Does this mean that Dan and Graham are more similar in their spider phobia than Graham and Andy? In theory this should be the case because Grahams score is more similar to Dans (difference = 5) than it is to Andys (difference = 15). In addition, is it the case that Andy is three times more phobic of spiders than Dan is? Is he twice as phobic as Graham? Again, his scores suggest that he should be. The point is that you cant guarantee in advance that differences in score are going to be comparable, yet a questionnaire needs to be constructed such that the difference in score is proportional to the difference between people. Validity Items on your questionnaire must measure something and a good questionnaire measures what you designed it to measure (this is called validity). Validity basically means measuring what you think youre measuring. So, an anxiety measure that actually measures assertiveness is not valid; Difference = 5

332

however, a materialism scale that does actually measure materialism is valid. Validity is a difficult thing to assess and it can take several forms: 1. Content validity: Items on a questionnaire must relate to the construct being measured. For example, a questionnaire measuring intrusive thoughts is pretty useless if it contains items relating to statistical ability. Content validity is really how representative your questions are the sampling adequacy of items. This is achieved when items are first selected: dont include items that are blatantly very similar to other items, and ensure that questions cover the full range of the construct. 2. Criterion validity: This is basically whether the questionnaire is measuring what it claims to measure. In an ideal world, you could assess this by relating scores on each item to real world observations (e.g. comparing scores on sociability items with the number of times a person actually goes out to socialize). This is often impractical and so there are other techniques such as (a) using the questionnaire in a variety of situations and seeing how predictive it is; (b) seeing how well it correlates with other known measures of your construct (i.e. sociable people might be expected to score highly on extroversion scales); and (c) using statistical techniques such as the Item Validity Index (IVI). 3. Factorial validity: This validity basically refers to whether the factor structure of the questionnaire makes intuitive sense. As such, factorial validity is assessed through factor analysis. When you have your final set of items you can conduct a factor analysis on the data (see the book). Factor analysis takes your correlated questions and recodes them into uncorrelated, underlying variables called factors (an example might be recoding the variables height, chest size, shoulder width and weight into an underlying variable called build). As another example, to assess success in a courze we might measure attentiveness in seminars, the amount of notes taken in seminars and the number of questions asked

333

during seminars all of these variables may relate to an underlying trait such as motivation to succeed. Factor analysis produces a table of items and their correlation, or loading, with each factor. A factor is composed of items that correlate highly with it. Factorial validity can be seen from whether the items tied onto factors make intuitive sense or not. Basically, if your items cluster into meaningful groups then you can infer factorial validity. Validity is a necessary but not sufficient condition of a questionnaire. Reliability A questionnaire must be not only valid, but also reliable. Reliability is basically the ability of the questionnaire to produce the same results under the same conditions. To be reliable the questionnaire must first be valid. Clearly the easiest way to assess reliability is to test the same group of people twice: if the questionnaire is reliable youd expect each persons scores to be the same at both points in time. So, scores on the questionnaire should correlate perfectly (or very nearly!). However, in reality, if we did test the same people twice then wed expect some practice effects and confounding effects (people might remember their responses from last time). Also this method is not very useful for questionnaires purporting to measure something that we would expect to change (such as depressed mood or anxiety). These problems can be overcome using the alternate form method in which two comparable questionnaires are devised and compared. Needless to say, this is a rather time-consuming way to ensure reliability and fortunately there are statistical methods to make life much easier. The simplest statistical technique is the split-half method. This method randomly splits the questionnaire items into two groups. A score for each subject is then calculated based on each half of the scale. If a scale is very reliable wed expect a persons score to be the same on one half of the scale as the other, and so the two halves should correlate perfectly. The correlation between the two halves is the statistic computed in the split-half method, large correlations being a sign of

334

reliability.4 The problem with this method is that there are a number of ways in which a set of data can be split into two and so the results might be a result of the way in which the data were split. To overcome this problem, Cronbach suggested splitting the data in two in every conceivable way and computing the correlation coefficient for each split. The average of these values is known as Cronbachs alpha, which is the most common measure of scale reliability. As a rough guide, a value of 0.8 is seen as an acceptable value for Cronbachs alpha; values substantially lower indicate an unreliable scale (see the book for more detail).
How to Design your Questionnaire

Step 1: Choose a Construct First you need to decide on what you would like to measure. Once you have done this use PsychLit and the Web of Knowledge to do a basic search for some information on this topic. I dont expect you to search through reams of material, but just get some basic background on the construct youre testing and how it might relate to psychologically important things. For example, if you looked at Empathy, this is seen as an important component of Carl Rogers client-centred therapy; therefore, having the personality trait of empathy might be useful if you were to become a Rogerian therapist. It follows then that having a questionnaire to measure this trait might be useful for selection purposes on Rogerian therapy training courses. So, basically you need to set some kind of context to why the construct is important this information will form the basis of your introduction. Step 2: Decide on a Response Scale A fundamental issue is how you want respondents to answer questions. You could choose to have:

In fact the correlation coefficient is adjusted to account for the smaller sample on which scores from the scale are based (remember that these scores are based on half of the items on the scale).

335

Yes/No or Yes/No/Dont Know scales: This forces people to give one answer or another even though they might feel that they are neither a yes nor no. Also, imagine you were measuring intrusive thoughts and you had an item I think about killing children. Chances are everyone would respond no to that statement (even if they did have those thoughts) because it is a very undesirable thing to admit. Therefore, all this item is doing is subtracting a value to everybodys score it tells you nothing meaningful, it is just noise in the data. This scenario can also occur when you have a rating scale with a dont know response (because people just cannot make up their minds and opt for the neutral response). It is which is why it is sometimes nice to have questionnaires with a neutral point to help you identify which things people really have no feeling about. Without this midpoint you are simply making people go one way or the other which is comparable to balancing a coin on its edge and seeing which side up it lands when it falls. Basically, when forced 50% will choose one option while 50% will choose the opposite this is just noise in your data.

Likert scale: This is the standard agreedisagree ordinal categories response. It comes in many forms:
o 3-point: AgreeNeither Agree nor DisagreeDisagree o 5-point: AgreeMidpointNeither Agree nor DisagreeMidpointDisagree o 7-Point: Agree2 PointsNeither Agree nor Disagree2 PointsDisagree
Questions should encourage respondents to use all points of the scale. So, ideally the statistical distribution of responses to a single item should be normal with a mean that lies at the centre of the scale (so on a 5-point Likert scale the mean on a given question should be 3). The range of scores should also cover all possible responses.

Step 3: Generate Your Items

336

Once youve found a construct to measure and decided on the type of response scale youre going to use, the next task is to generate items. I want you to restrict your questionnaire to around 30 items (20 minimum). The best way to generate items is to brainstorm a small sample of people. This involves getting people to list as many facets of your construct as possible. For example, if you devised a questionnaire on exam anxiety, you might ask a number of students (20 or so) from a variety of courses (arts and science), years (first, second and final) and even institutions (friends at other universities) to list (on a piece of paper) as many things about exams as possible that make them anxious. It is good if you can include people within this sample that you think might be at the extremes of your construct (e.g. select a few people who get very anxious about exams and some who are very calm). This enables you to get items that span the entire spectrum of the construct that you want to measure. This will give you a pool of items to inspire questions. Rephrase your samples suggestions in a way that fits the rating scale youve chosen and then eliminate any questions that are basically the same. You should hopefully begin with a pool of say 5060 questions that you can reduce to about 30 by eliminating obviously similar questions. Things to Consider: 1. Wording of questions: The way in which questions are phrased can bias the answers that people give; For example, Gaskell, Wright, and OMuircheartaigh (1993) report several studies in which subtle changes in the wording of survey questions can radically affect peoples responses. Gaskell et al.s article is a very readable and useful summary of this work and their conclusions might be useful to you when thinking about how to phrase your questions.

337

2. Response bias: This is the tendency of respondents to give the same answer to every question. Try to reverse-phrase a few items to avoid response bias (and remember to score these items in reverse when you enter the data into SPSS). Step 4: Collect the Data Once youve written your questions, randomize their order and produce your questionnaire. This is the questionnaire that youre going test. Photocopy the questionnaire and administer it to as many people as possible (one benefit of making these questionnaires short is that it minimizes the time taken to complete them!). You should aim for 50100 respondents, but the more you get, the better your analysis (which is why I suggest working in slightly bigger groups to make data collection easier). Step 5: Analysis Enter the data into SPSS by having each question represented by a column in SPSS. Translate your response scale into numbers (i.e. a 5point Likert might be 1 = completely disagree, 2 = disagree, 3 = neither agree nor disagree, 4 = agree, 5 = completely agree). Reversephrased items should be scored in reverse too! What were trying to do with this analysis is to first eliminate any items on the questionnaire that arent useful. So, were trying to reduce our 30 items further before we run our factor analysis. We can do this by looking at descriptive statistics and also correlations between questions. Descriptive statistics: The first thing to look at is the statistical distribution of item scores. This alone will enable you to throw out many redundant items. Therefore, the first thing to do when piloting a questionnaire is look at descriptive statistics on the questionnaire items. This is easily done in SPSS (see the book chapter). Were on the lookout for: 1. Range: Any item that has a limited range (all the points of the scale have not been used).

338

2. Skew: I mentioned above that ideally each question should elicit a normally distributed set of responses across subjects (each items mean should be at the centre of the scale and there should be no skew). To check for items that produce skewed data, look for the skewness and SE skew in your SPSS output. We have also discovered in this book course that you can divide the skewness by its standard error (SE skew) to form a z-score (see Chapter 5). 3. Standard deviation: Related to the range and skew of the distribution, items with high or low standard deviations may cause problems, so be wary of high and low values for the SD. These are your first steps. Basically if any of these rules are violated then your items become non-comparable (in terms of the factor analysis) which makes the questionnaire pretty meaningless! Correlations: All of your items should inter correlate at a significant level if they are measuring aspects of the same thing. If any items do not correlate at a 5% or 1% level of significance then exclude them. You can get a table of intercorrelations from SPSS. The book gives more detail on screening correlation coefficients for items that correlate with few others or correlate too highly with other items (multicollinearity and singularity). Factor analysis: When youve eliminated any items that have distributional problems or do not correlate with each other, then run your factor analysis on the remaining items and try to interpret the resulting factor structure. The book chapter details the process of factor analysis. What you should do is examine the factor structure and decide: 1. Which factors to retain. 2. Which items load onto those factors. 3. What your factors represent.

339

4. If there are any items that dont load highly onto any factors, they should be eliminated from future versions of the questionnaire (for our purposes you need only state that they are not useful items as you wont have time to revise and retest your questionnaires!). Step 6: Assess the Questionnaire Having looked at the factor structure, you need to check the reliability of your items and the questionnaire as a whole. We should run a reliability analysis on the questionnaire. This is explained in Chapter 17 of the book. There are two things to look at: (1) the Item Reliability Index (IRI), which is the correlation between the score on the item and the score on the test as a whole multiplied by the standard deviation of that item (called the corrected item-total correlation in SPSS). SPSS will do this corrected item-total correlation and wed hope that these values would be significant for all items. Although we dont get significance values as such, we can look for correlations greater than about 0.3 (although the exact value depends on the sample size, this is a good cut-off for the size of sample youll probably have). Any items having correlations less than 0.3 should be excluded from the questionnaire. (2) Cronbachs alpha, as weve seen, should be 0.8 or more and the deletion of an item should not affect this value too much (see the reliability analysis handout for more detail).
The End?

You should conclude by describing your factor structure and the reliability of the scale. Also say whether there are items that you would drop in a future questionnaire. In an ideal world wed then generate new items to add to the retained items and start the whole process again!

340

Labcoat Lenis Real Research: World Wide Addiction?

Nichols, L.A., & Nicki, R.(2004). Psychology of Addictive Behaviors, 18(4), 381384.

The Internet is now a houshold tool. In 2007 it was estimated that around 179 million people worldwide used the Internet (over 100 million of those were in the USA and Canada). From the increasing populatrity (and usefulness) of the Internet has emerged a new phenomenon: Internet addiction. This is now a serious and recognized problem, but until very recently it was very difficult to research this topic because there was not a psychometrically sound measure of Internet addition. That is, until Laura Nichols and Richard Nicki developed the Internet Addiction Scale, IAS (Nichols & Nicki, 2004). (Incidentally, while doing some research on this topic I encountered an Internet addiction recovery website that I wont name but offered a whole host of resources that would keep you online for ages, such as questionnaires, an online support group, videos, articles, a recovery blog and podcasts. It struck me that that this was a bit like having a recovery centre for heroin addiction where the addict arrives to be greeted by a nicelooking counsellor who says theres a huge pile of heroin in the corner over there, just help yourself.) Anyway, Nichols and Nicki developed a 36item questionnaire to measure Internet addiction. It contained items such as I have stayed on the Internet longer than I intended to and My grades/work have suffered because of my Internet use which could be responded to on a 5-point scale (Never, Rarely, Sometimes, Frequently, Always). They collected data from 207 people to validate this measure. The data from this study are in the file Nichols & Nicki (2004).sav. The authors dropped two

341

items because they had low means and variances, and dropped three others because of relatively low correlations with other items. They performed a principal components analysis on the remaining 31 items. Labcoat Leni wants you to run some descriptive statistics to work out which two items were dropped for having low means/variances, and then inspect a correlation matrix to find the three items that were dropped for having low correlations. Finally, he wants you to run a principal component analysis on the data.

To get the descriptive statistics I would use of the questionnaire items but just ask for means and standard deviations at this stage:

. Select all

342

The table of means and standard deviations shows that the items with the lowest values are IAS-23 (I see my friends less often because of the time that I spend on the Internet) and IAS-34 (When I use the Internet, I experience a buzz or a high).

To get a table of correlations select and leave the default options as they are:

. Select all of the variables

343

344

IAS01 IAS02 IAS03 IAS04 IAS05 IAS06 IAS07 IAS08 IAS09 IAS10 IAS11 IAS12 IAS13 IAS14 IAS15 IAS16 IAS17 IAS18 IAS19 IAS20 IAS21 IAS22 IAS23 IAS24 IAS25 IAS26 IAS27 IAS28 IAS29 IAS30 IAS31 IAS32 IAS33 IAS34 IAS35 IAS36 IAS01 IAS02 0.43 0.46 0.35 0.52 0.56 0.48 0.48 0.51 0.43 0.42 0.43 0.12 0.49 0.51 0.52 0.35 0.47 0.46 0.48 0.47 0.16 0.28 0.42 0.45 0.52 0.40 0.49 0.54 0.47 0.33 0.22 0.50 0.44 0.38 0.49 0.43 0.33 0.54 0.38 0.24 0.39 0.32 0.29 0.30 0.26 0.32 0.37 0.38 0.35 0.30 0.25 0.28 0.28 0.29 0.29 0.15 0.19 0.31 0.26 0.28 0.29 0.20 0.32 0.30 0.24 0.15 0.36 0.20 0.27 0.32 0.30 0.52 0.47 0.41 0.49 0.62 0.50 0.40 0.43 0.46 0.19 0.40 0.42 0.40 0.39 0.36 0.65 0.44 0.45 0.18 0.26 0.60 0.35 0.44 0.39 0.37 0.40 0.42 0.51 0.26 0.45 0.29 0.43 0.46 0.42 0.46 0.27 0.45 0.44 0.37 0.37 0.27 0.44 0.31 0.36 0.27 0.31 0.29 0.34 0.42 0.42 0.36 0.15 0.25 0.41 0.37 0.27 0.26 0.22 0.43 0.39 0.28 0.17 0.47 0.22 0.25 0.35 0.34 0.48 0.43 0.59 0.51 0.52 0.34 0.44 0.24 0.40 0.37 0.36 0.40 0.47 0.51 0.47 0.52 0.15 0.34 0.49 0.47 0.43 0.35 0.39 0.55 0.47 0.33 0.25 0.60 0.42 0.42 0.47 0.43 0.50 0.43 0.50 0.59 0.42 0.50 0.10 0.44 0.39 0.39 0.50 0.39 0.49 0.45 0.49 0.16 0.29 0.38 0.36 0.65 0.33 0.57 0.46 0.42 0.28 0.36 0.35 0.47 0.26 0.51 0.41 0.47 0.54 0.60 0.41 0.60 0.22 0.37 0.36 0.27 0.50 0.55 0.44 0.53 0.55 0.15 0.21 0.49 0.39 0.44 0.37 0.42 0.48 0.41 0.33 0.14 0.50 0.43 0.25 0.66 0.42 0.63 0.48 0.43 0.54 0.24 0.42 0.42 0.42 0.43 0.46 0.63 0.53 0.57 0.29 0.30 0.54 0.51 0.47 0.40 0.49 0.45 0.43 0.39 0.23 0.49 0.41 0.47 0.56 0.46 0.56 0.44 0.49 0.21 0.45 0.45 0.34 0.50 0.54 0.48 0.52 0.63 0.10 0.32 0.51 0.41 0.46 0.34 0.50 0.51 0.41 0.30 0.13 0.53 0.37 0.25 0.55 0.43 0.51 0.64 0.21 0.49 0.44 0.25 0.57 0.58 0.49 0.56 0.58 0.22 0.30 0.45 0.40 0.51 0.28 0.48 0.58 0.38 0.30 0.27 0.47 0.52 0.27 0.64 0.45 0.51 0.23 0.42 0.55 0.40 0.46 0.46 0.49 0.46 0.54 0.33 0.31 0.53 0.42 0.57 0.36 0.58 0.45 0.36 0.30 0.19 0.44 0.34 0.31 0.49 0.41 0.26 0.43 0.38 0.27 0.60 0.50 0.54 0.55 0.61 0.22 0.25 0.49 0.43 0.53 0.27 0.47 0.53 0.43 0.34 0.22 0.46 0.37 0.25 0.56 0.44 0.19 0.11 0.10 0.12 0.16 0.16 0.19 0.27 0.31 0.20 0.33 0.20 0.14 0.18 0.19 0.19 0.21 0.17 0.26 0.23 0.10 0.16 0.20 0.20 0.47 0.34 0.41 0.47 0.43 0.57 0.43 0.30 0.35 0.46 0.35 0.53 0.40 0.51 0.47 0.45 0.31 0.30 0.43 0.42 0.45 0.54 0.42 0.41 0.37 0.52 0.51 0.52 0.43 0.20 0.44 0.40 0.40 0.65 0.41 0.71 0.47 0.39 0.28 0.22 0.47 0.48 0.43 0.57 0.42 0.25 0.34 0.43 0.38 0.26 0.27 0.26 0.32 0.35 0.49 0.36 0.44 0.44 0.34 0.22 0.27 0.35 0.37 0.35 0.39 0.34 0.51 0.43 0.54 0.52 0.18 0.26 0.43 0.36 0.54 0.17 0.43 0.48 0.42 0.29 0.26 0.36 0.39 0.26 0.58 0.39 0.45 0.64 0.55 0.15 0.42 0.45 0.52 0.51 0.24 0.51 0.57 0.43 0.18 0.13 0.55 0.54 0.35 0.65 0.44 0.52 0.53 0.19 0.32 0.61 0.48 0.60 0.38 0.53 0.46 0.52 0.42 0.24 0.44 0.39 0.46 0.57 0.46 0.57 0.26 0.41 0.51 0.58 0.62 0.34 0.61 0.71 0.46 0.35 0.30 0.62 0.52 0.45 0.69 0.49 0.27 0.28 0.56 0.41 0.55 0.26 0.54 0.54 0.50 0.36 0.26 0.54 0.41 0.24 0.59 0.45 0.39 0.21 0.28 0.28 0.18 0.32 0.18 0.28 0.16 0.33 0.15 0.20 0.26 0.27 0.23 0.35 0.40 0.47 0.20 0.52 0.44 0.28 0.18 0.24 0.29 0.32 0.48 0.47 0.32 0.49 0.47 0.44 0.46 0.49 0.55 0.55 0.21 0.46 0.27 0.42 0.49 0.44 0.48 0.39 0.52 0.53 0.43 0.27 0.21 0.53 0.41 0.48 0.51 0.41 0.41 0.76 0.56 0.47 0.25 0.28 0.49 0.63 0.52 0.64 0.48 0.46 0.32 0.25 0.39 0.15 0.37 0.27 0.41 0.35 0.32 0.56 0.39 0.22 0.30 0.41 0.56 0.45 0.65 0.46 0.45 0.28 0.26 0.68 0.59 0.38 0.64 0.47 0.43 0.33 0.43 0.30 0.31 0.49 0.40 0.20 0.33 0.11 0.33 0.35 0.30 0.26 0.25 0.26 0.19 0.24 0.47 0.37 0.52 0.44 0.49 0.58 0.39 0.43 0.36 0.50 0.43 0.46 0.33 0.35 0.54 0.52 0.52 0.38 0.47 0.46 0.56 0.24 0.41 0.27 0.48 0.48 0.39 0.49 0.45 0.43 0.50 0.48 0.32 0.62 0.44 0.59 0.43 0.47 0.51 0.29 0.50 0.37 0.51 0.50 0.54 0.63 0.43 0.30 0.40 0.37 0.52 0.59 0.60 0.48 0.56 0.42 0.26 0.43 0.27 0.34 0.42 0.41 0.43 0.44 0.51 0.43 0.32 0.46 0.44 0.44 0.50 0.60 0.54 0.49 0.64 0.51 0.12 0.37 0.19 0.31 0.24 0.10 0.22 0.24 0.21 0.21 0.23 0.26 0.49 0.38 0.40 0.36 0.40 0.44 0.37 0.42 0.45 0.49 0.42 0.43 0.19 0.51 0.35 0.42 0.27 0.37 0.39 0.36 0.42 0.45 0.44 0.55 0.38 0.11 0.47 0.52 0.30 0.40 0.31 0.36 0.39 0.27 0.42 0.34 0.25 0.40 0.27 0.10 0.34 0.41 0.35 0.25 0.39 0.29 0.40 0.50 0.50 0.43 0.50 0.57 0.46 0.60 0.12 0.41 0.37 0.25 0.47 0.28 0.36 0.34 0.47 0.39 0.55 0.46 0.54 0.58 0.46 0.50 0.16 0.47 0.52 0.34 0.51 0.46 0.28 0.65 0.42 0.51 0.49 0.44 0.63 0.48 0.49 0.49 0.54 0.16 0.43 0.51 0.43 0.43 0.45 0.48 0.29 0.44 0.42 0.47 0.45 0.53 0.53 0.52 0.56 0.46 0.55 0.19 0.57 0.52 0.38 0.54 0.64 0.52 0.47 0.29 0.45 0.36 0.52 0.49 0.55 0.57 0.63 0.58 0.54 0.61 0.27 0.43 0.43 0.26 0.52 0.55 0.53 0.57 0.16 0.15 0.18 0.15 0.15 0.16 0.15 0.29 0.10 0.22 0.33 0.22 0.31 0.30 0.20 0.27 0.18 0.15 0.19 0.26 0.27 0.28 0.19 0.26 0.25 0.34 0.29 0.21 0.30 0.32 0.30 0.31 0.25 0.20 0.35 0.44 0.26 0.26 0.42 0.32 0.41 0.28 0.39 0.42 0.31 0.60 0.41 0.49 0.38 0.49 0.54 0.51 0.45 0.53 0.49 0.33 0.46 0.40 0.32 0.43 0.45 0.61 0.51 0.56 0.21 0.35 0.45 0.26 0.35 0.37 0.47 0.36 0.39 0.51 0.41 0.40 0.42 0.43 0.20 0.35 0.40 0.35 0.36 0.52 0.48 0.58 0.41 0.28 0.40 0.49 0.52 0.28 0.44 0.27 0.43 0.65 0.44 0.47 0.46 0.51 0.57 0.53 0.14 0.53 0.65 0.49 0.54 0.51 0.60 0.62 0.55 0.28 0.47 0.47 0.48 0.40 0.29 0.39 0.26 0.35 0.33 0.37 0.40 0.34 0.28 0.36 0.27 0.18 0.40 0.41 0.36 0.17 0.24 0.38 0.34 0.26 0.18 0.20 0.44 0.39 0.41 0.49 0.20 0.37 0.22 0.39 0.57 0.42 0.49 0.50 0.48 0.58 0.47 0.19 0.51 0.71 0.44 0.43 0.51 0.53 0.61 0.54 0.32 0.52 0.46 0.52 0.76 0.46 0.54 0.32 0.40 0.43 0.55 0.46 0.48 0.45 0.51 0.58 0.45 0.53 0.19 0.47 0.47 0.44 0.48 0.57 0.46 0.71 0.54 0.18 0.44 0.49 0.53 0.56 0.32 0.56 0.47 0.30 0.42 0.39 0.47 0.42 0.41 0.43 0.41 0.38 0.36 0.43 0.21 0.45 0.39 0.34 0.42 0.43 0.52 0.46 0.50 0.28 0.28 0.55 0.43 0.47 0.25 0.39 0.45 0.33 0.24 0.51 0.28 0.33 0.28 0.33 0.39 0.30 0.30 0.30 0.34 0.17 0.31 0.28 0.22 0.29 0.18 0.42 0.35 0.36 0.16 0.18 0.55 0.27 0.25 0.39 0.22 0.28 0.43 0.22 0.15 0.26 0.17 0.25 0.36 0.14 0.23 0.13 0.27 0.19 0.22 0.26 0.30 0.22 0.27 0.26 0.13 0.24 0.30 0.26 0.33 0.24 0.21 0.21 0.28 0.15 0.30 0.26 0.33 0.20 0.50 0.36 0.45 0.47 0.60 0.35 0.50 0.49 0.53 0.47 0.44 0.46 0.23 0.43 0.47 0.35 0.36 0.55 0.44 0.62 0.54 0.15 0.29 0.46 0.53 0.49 0.37 0.41 0.68 0.43 0.33 0.26 0.44 0.20 0.29 0.22 0.42 0.47 0.43 0.41 0.37 0.52 0.34 0.37 0.10 0.42 0.48 0.37 0.39 0.54 0.39 0.52 0.41 0.20 0.32 0.27 0.41 0.63 0.27 0.56 0.59 0.30 0.11 0.25 0.47 0.38 0.27 0.43 0.25 0.42 0.26 0.25 0.47 0.25 0.27 0.31 0.25 0.16 0.45 0.43 0.35 0.26 0.35 0.46 0.45 0.24 0.26 0.48 0.42 0.48 0.52 0.41 0.45 0.38 0.31 0.33 0.26 0.37 0.49 0.49 0.32 0.46 0.35 0.47 0.51 0.66 0.56 0.55 0.64 0.49 0.56 0.20 0.54 0.57 0.39 0.58 0.65 0.57 0.69 0.59 0.27 0.47 0.49 0.51 0.64 0.35 0.65 0.64 0.49 0.35 0.19 0.52 0.58 0.43

IAS03 IAS04 IAS05 IAS06 IAS07 IAS08 IAS09 IAS10 IAS11 IAS12 IAS13 IAS14 IAS15 IAS16 IAS17 IAS18 IAS19 IAS20 IAS21 IAS22 IAS23 IAS24 IAS25 IAS26 IAS27 IAS28 IAS29 IAS30 IAS31 IAS32 IAS33 IAS34 IAS35 IAS36 Mean

345

We know that the authors eliminated three items for having low correlations. My table of correlations has the average correlation. The lowest average correlations are for items IAS-13 (I have felt a persistent desire to cut down or control my use of the Internet), IAS-22 (I have neglected things which are important and need doing) and IAS-32 (I find myself thinking/longing about when I will go on the Internet again). As such these variables will also be excluded from the factor analysis. To do the principal component analysis select Choose all of the variables except for the five that we have excluded: .

346

347

The output should look like this:

Sample size: McCallum et al. (1999) have demonstrated that when communalities after extraction are above .5, a sample size between 100 and 200 can be adequate and even when communalities are below .5, a sample size of 500 should be sufficient. We have a sample size of 207 with only one communality below .5, and so the sample size should be adequate. However, the KMO measure of sampling adequacy is .942, which is above Kaisers (1974) recommendation of .5. This value is also marvellous according to Hutcheson & Sofroniou (1999). As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors.

348

Bartletts test: This tests whether the correlations between questions are sufficiently large for factor analysis to be appropriate (it actually tests whether the correlation matrix is sufficiently different from an identity matrix). In this case it is significant (2(465) = 4238.98, p < .001) indicating that the correlations within the R-matrix are sufficiently different from zero to warrant factor analysis.

349

350

Extraction: SPSS has extracted five factors based on Kaisers criterion of retaining factors with eigenvalues greater than 1. Is this warranted? Kaisers criterion is accurate when there are less than 30 variables and the communalities after extraction are greater than .7, or when the sample size exceeds 250 and the average communality is greater than .6. For these data the sample size is 207, there are 31 variables and the mean communality is .64, so extracting five factors is probably not warranted. The scree plot shows a clear onefactor solution. This is the solution that the authors adopted.

351

Because we are retaining only one factor we can ignore the rotated factor solution and just look at the unrotated component matrix. This shows that all items have a high loading on factor 1.

The authors reported their analysis as follows: We conducted principal-components analyses on the log transformed scores of the IAS (see above). On the basis of the scree test (Cattell, 1978) and the percentage of variance accounted for by each factor, we judged a one-factor solution to be most appropriate. This component accounted for a total of 46.50% of the variance. A value for loadings of .30 (Floyd & Widaman, 1995) was used as a cut-off for items that did not relate to a component.

352

All 31 items loaded on this component, which was interpreted to represent aspects of a general factor relating to Internet addiction reflecting the negative consequences of excessive Internet use. (P. 382)
Chapter 18

Self-Test Answers

Run a multiple regression analysis using CatsRegression.sav with


LnObserved as the outcome, and Training, Dance and Interaction as your three predictors.

The multiple regression dialog box will look like the following diagram. We can leave all of the default options as they are because we are interested only in the regression parameters.

353

The regression parameters are shown in the book. To show that this all actually works, run another multiple regression analysis using CatsRegression.sav; this time the outcome is the log of expected frequencies (LnExpected) and
Training and Dance are the predictors (the interaction is not

included).

The multiple regression dialog box will look like the following diagram. We can leave all of the default options as they are because we are interested only in the regression parameters.

354

The resulting regression parameters are:

Note that b0 = 2.67, the beta coefficient for the type of training is 1.45 and the beta coefficient for whether they danced is 0.49. All of these values are consistent with those calculated in the book Chapter.

355

Create a contingency table of these data with dance as the columns, the type of training as rows and the type of animal as a layer.

To use the crosstabs command select

. We have

three variables in our crosstabulation table: whether the animal danced or not (Dance), the type of reward given (Training), and whether the animal was a cat or dog (Animal). Select Training and drag it into the box labelled Row(s) (or click on
Dance and drag it to the box labelled Column(s) (or click on

). Next, select

). We have a third variable

too, and we need to define this variable as a layer. Select Animal and drag it to the box labelled Layer 1 of 1 (or click on ). Then click on and select the options below.

356

Can you use the Chart Builder to replicate the graph in Figure 18.7? Actually this selftest is not as easy as it looks. The diagrams below guide you through the process.

Click here to select a clustered bar chart

Drag Animal here to create separate panels for dogs and cats

Drag Dance here. Bars will be coloured by whether animals danced or not

357

Click here to create panels in the graph

Drag Training here. Data will be clustered by the type of training used

Select to make the panels appear in columns

358

We want to display percentages rather than counts because there were more cats than dogs and this will allow us to compare animals directly. To do this, click here and select Percentage() from the list

Dont forget to click here to apply the changes to the graph

By default SPSS will display the percentage of the total sample. However, we want the percentage to be calculated within each animal (i.e. the percentage of cats that danced for food). To display these percentages, select Total for Panel from the drop-down list. This will calculate the percentage within each panel (not all panels combined). This means that we will get the percentage of cats and dogs, not the percentage of all animals

Use the split file command to run a chi-square test on Dance and
Training for dogs and cats.

First, to split the file we need to select

and then select the Organize output by

groups option. Once this option is selected, the Groups Based on box will activate. Select

359

the variable containing the group codes by which you wish to repeat the analysis (in this example select Animal), and drag it to the box or click on .

To run the chi-square tests, select

. First, select

one of the variables of interest in the variable list and drag it into the box labelled Row(s) (or click on ). For this example, I selected Training to be the rows of the table. Next,

select the other variable of interest (Dance) and drag it to the box labelled Column(s) (or click on ). Select the same options as in the book (for the cat example).

360

Additional Material

Labcoat Lenis Real Research: Is the black American happy?

Beckham, A. S. (1929). Journal of Abnormal and Social Psychology, 24, 186190.

When I was doing my psychology degree I spent a lot of time reading about the civil rights movement in the USA. Although I was supposed to be reading psychology, I became more interested in Malcolm X and Martin Luther King Jr. This is why I find Beckhams 1929 study of black Americans such an interesting piece of research. Beckham was a black American academic who founded the Psychology Laboratory at

361

Howard University, Washington, D.C, and his wife Ruth was the first black woman ever to be awarded a Ph.D. (also in psychology) at the University of Minnesota. The article needs to be placed within the era in which it was published. To put some context on the study, it was published 36 years before the Jim Crow laws were finally overthrown by the Civil Rights Act of 1964, and in a time when black Americans were segregated, openly discriminated against and were victims of the most abominable violations of civil liberties and human rights. For a richer context I suggest reading James Baldwins superb novel The fire next time. Even the language of the study and the data from it are an uncomfortable reminder of the era in which it was conducted. Beckham sought to measure the psychological state of black Americans with three questions asked to 3443 black Americans from different walks of life. He asked them whether they thought black Americans were happy, whether they personally were happy as a black American, and whether black Americans should be happy. They could answer only yes or no to each question. By todays standards the study is quite simple, and he did no formal statistical analysis on his data (Fishers article containing the popularized version of the chi-square test was published only seven years earlier in a statistics journal that would not have been read by psychologists). I love this study, though, because it demonstrates that you do not need elaborate methods to answer important and farreaching questions; with just three questions, Beckham told the world an enormous amount about very real and important psychological and sociological phenomena. The frequency data (number of yes and no responses within each employment category) from this study are in the file Beckham(1929).sav. Labcoat Leni wants you to carry out three chi-square tests (one for each question that was asked). What conclusions can you

362

draw?

Are black Americans Happy? Lets run the analysis on the first question. First we must remember to tell SPSS which variable contains the frequencies by using the weight cases command. Select , then in the resulting dialog box select and then select the

variable in which the number of cases is specified (in this case Happy) and drag it to the box labelled Frequency variable (or click on ). This process tells the computer that it

should weight each category combination by the number in the column labelled happy.

To conduct the chi-square test, use the crosstabs command by selecting . We have two variables in our crosstabulation table: the occupation of the participant (Profession) and whether they responsed yes or no to the question (Response). Select one of these variables and drag it into the box

363

labelled Row(s) (or click on

). For this example, I selected Profession to be the rows of

the table. Next, select the other variable of interest (Response) and drag it to the box labelled Column(s) (or click on ). Use the book chapter to select other appropriate

options (we do not need to use the exact test used in the chapter because our sample size is very large; however, you could choose a Monte Carlo test of significance if you like).

364

365

The chi-square test is highly significant, 2(7) = 936.14, p < .001. This indicates that the profile of yes and no responses differed across the professions. Looking at the standardized residuals, the only profession for which these are non-significant are housewives who showed a fairly even split of whether they thought black Americans were happy (40%) or not (60%). Within the other professions all of the standardized residuals are much higher than 1.96, so how can we make sense of the data? Whats interesting is to look at the direction of these residuals (i.e. whether they are positive or negative). For the following professions the residual for no was positive but for yes was negative; these are therefore people who responded more than we would expect that black Americans were not happy and less than expected that black Americans were happy: college students, preachers and lawyers. The remaining professions (labourers, physicians, school teachers and musicians) show the opposite pattern: the residual for no was negative but for yes was positive; these are therefore people who responded less than we would expect that black Americans were not happy and more than expected that black Americans were happy.

Are they Happy as black Americans? We run this analysis in exactly the same way except that we now have to weight the case by the variable You_Happy. Select ; then in the resulting dialog box

should already be selected from the previous analysis. Select the variable in the box labelled Frequency variable and click on to move it back to the variable list

and clear the box. Then, we need to select the variable in which the number of cases is specified (in this case You_Happy) and drag it to the box labelled Frequency variable
366

(or click on

). This process tells the computer that it should weight each category

combination by the number in the column labelled You_Happy.

Then carry out the analysis through crosstabs exactly as before.

367

368

The chi-square test is highly significant, 2 (7) = 1390.74, p < .001. This indicates that the profile of yes and no responses differed across the professions. Looking at the standardized residuals, these are significant in most cells with a few exceptions: physicians, lawyers and school teachers saying yes. Within the other cells all of the standardized residuals are much higher than 1.96. Again, we can look at the direction of these residuals (i.e. whether they are positive or negative). For labourers, housewives, school teachers and musicians the residual for no was positive but for yes was negative; these are therefore people who responded more than we would expect that they were not happy as black Americans and less than expected that they were happy as black Americans. The remaining professions (college students, physicians, preachers and lawyers) show the opposite pattern: the residual for no was negative but for yes was positive; these are therefore people who responded less than we would expect that they were not happy as black Americans and more than expected that they were happy as black Americans. Essentially, the former group are in lowpaid jobs in which conditions would have been very hard (especially in the social context of the time). The latter group are in much more respected (and probably better-paid) professions. Therefore, the

369

responses to this question could say more about the professions of the people asked than their views of being black Americans.

Should black Americans be happy? We run this analysis in exactly the same way except that we now have to weight the case by the variable Should_Be_Happy. Select ; then in the resulting dialog box

should already be selected from the previous analysis. Select the variable in the box labelled Frequency variable and click on to move it back to the variable list

and clear the box. Then, we need to select the variable in which the number of cases is specified (in this case Should_Be_Happy) and drag it to the box labelled Frequency variable (or click on ). This process tells the computer that it should weight each

category combination by the number in the column labelled Should_Be_Happy. Then carry out the analysis through crosstabs exactly as before.

370

371

The chi-square test is highly significant, 2 (7) = 1784.23, p < .001. This indicates that the profile of yes and no responses differed across the professions. Looking at the standardized residuals, these are nearly all significant. Again, we can look at the direction of these residuals (i.e. whether they are positive or negative). For college students and lawyers the residual for no was positive but for yes was negative; these are therefore people who responded more than we would expect that they thought that black Americans should not be happy and less than expected that they thought black Americans should be happy. The remaining professions show the opposite pattern: residual for no was negative but for yes was positive; these are therefore people who responded less than we would expect that they did not think that black Americans should be happy and more than expected that they thought that black Americans should be happy. What is interesting here and in question 1 is that college students and lawyers are in vocations in which they are expected to be critical about the world. Lawyers may well have defended black Americans who had been the subject of injustice and discrimination or racial abuse, and college students would likely be applying their critically trained minds to the immense social injustice that prevailed at the time. Therefore, these groups can see that their racial group should not be happy and should strive for the equitable and just society to which they are entitled. People in the other professions perhaps adopt a different social comparison. Its also possible for this final question that the groups interpreted the question differently: perhaps the lawyers and students interpreted the question as should they be

372

happy given the political and social conditions of the time? whereas the others interpreted the question as do they deserve happiness?. It might seem strange to have picked a piece of research from so long ago to illustrate the chi-square test, but what I wanted to demonstrate is that simple research can sometimes be incredibly illuminating. This study asked three simple questions, yet the data are utterly fascinating. It raises further hypotheses that could be tested, it unearths very different views in different professions, and it illuminates a very important social and psychological issue. There are other studies that sometimes use the most elegant paradigms and the highly complex methodologies, but the questions they address are utterly meaningless for the real world. They miss the big picture. Albert Beckham was a remarkable man, trying to understand important and big realworld issues that mattered to hundreds of thousands of people.
Chapter 19

Self-Test Answers

Using what you know about ANOVA, conduct a one-way ANOVA using Surgery as the predictor and Post_QoL as the outcome.

Select

and complete the dialog box as follows:

373

Using what you know about ANCOVA, conduct a one-way ANCOVA using Surgery as the predictor, Post_QoL as the outcome and Base_QoL as the covariate.

Select

and complete the dialog box as follows:

374

Split the file by Reason and then run a multilevel model predicting
Post_QoL with a random intercept, and random slopes for Surgery, and including Base_QoL and Surgery as predictors.

First, split the file by Reason by selecting look like this:

. The completed dialog box should

375

Next, we need to run the multilevel model. Select

and

specify the contextual variable by selecting Clinic from the list of variables and dragging it to the box labelled Subjects (or click on ).

376

Click on

to move to the main dialog box. First we must specify our outcome

variable, which is quality of life (QoL) after surgery, so select Post_QoL and drag it to the space labelled Dependent variable (or click on ). Next we need to specify our

predictors. Therefore, select Surgery and Base_QoL (hold down Ctrl and you can select both of them simultaneously) and drag them to the space labelled Covariate(s) (or click on ).

The main mixed models dialog box

We need to add the predictors as fixed effect to our model, so click on

, hold

down Ctrl and select Base_QoL and Surgery in the list labelled Factors and Covariates. Then make sure that predictors to the Model. Click on is set to and click on to transfer these

to return to the main dialog box.

377

We now need to ask for a random intercept and random slopes for the effect of Surgery. Click on in the main dialog box. Select Clinic and drag it to the area labelled ). We want to specify that the intercept is random, and we do . Next, select Surgery from the list of Factors and covariates . The other change that we need to make is

Combinations (or click on this by selecting

and add it to the model by clicking on

that we need to estimate the covariance between the random slope and random intercept. This estimation is achieved by clicking on selecting . to access the dropdown list and

378

Click on

and select

. Click on

to return to the main dialog box.

In the main dialog box click on covariance parameter. Click on analysis, click on .

and request Parameter estimates and Tests for to return to the main dialog box. To run the

Use the compute command to transform time into time minus 1.

Access the compute command by selecting

. In the resulting window

enter the name Time into the box labelled Target Variable. Select the variable Time and drag it across to the area labelled Numeric Expression, then click on The completed dialog box is below: and then type 1.

379

Additional Material Oliver Twisted: Please Sir, Can I Have Some More ICC?

I have a dependency on gruel, whines Oliver. Maybe I could measure this dependency if I knew more about the ICC. Well youre so high on gruel Oliver that you have rather missed the point. Still, I did write an article on the ICC once upon a time (Field, 2005) and its reproduced in the additional web material for your delight and amusement.

380

The following article originally appeared in: Field, A. P. (2005). Intraclass correlation. In B. Everitt & D. C. Howell (eds.), Encyclopedia of Behavioral Statistics (Vol. 2, pp. 948954). New York: Wiley.

It appears in adopted form below: Commonly used correlations such as the Pearson product moment correlation measure the bivariate relation between variables of different measurement classes. These are known as interclass correlations. By different measurement classes we really just mean variables measuring different things. For example, we might look at the relation between attractiveness and career success, clearly one of these variables represents a class of measures of how good looking a person is, whereas the other represents the class of measurements of something quite different: how much someone achieves in their career. However, there are often cases in which it is interesting to look at relations between variables within classes of measurement. In its simplest form, we might compare only two variables. For example, we might be interested in whether anxiety runs in families and we could look at this by measuring anxiety within pairs of twins (Eley & Stevenson, 1999). In this case the objects being measured are twins, and both twins are measured on some index of anxiety. As such, there is a pair of variables both measuring anxiety, therefore, from the same class. In such cases, an intraclass correlation (ICC) is used and is commonly extended beyond just two variables to look at the consistency between judges. For example, in gymnastics, ice skating, diving and other Olympic sports,

381

contestants performance is often assessed by a panel of judges. There might be 10 judges, all of whom rate performance out of 10; therefore, the resulting measures are from the same class (they measure the same thing). The objects being rated are the competitors. This again is a perfect scenario of an intraclass correlation.

Models of Intraclass Correlations

There are a variety of different intraclass correlations (McGraw & Wong, 1996; Shrout & Fleiss, 1979) and the first step in calculating one is to determine a model for your sample data. All of the various forms of the intraclass correlation are based on estimates of mean variability from a

one-way repeated measures Analysis of Variance. All situations in which an intraclass correlation is desirable will involve multiple measures on different entities (be they twins, Olympic competitors, pictures, sea slugs etc.). The objects measured constitute a random factor in the design (they are assumed to be random exemplars of the population of objects). The measures taken can be included as factors in the design if they have a meaningful order, or can be excluded if they are unordered as we shall now see. One-Way Random Effects Model In the simplest case we might have only two measures (think back to our twin study on anxiety). When the order of these variables is irrelevant (for example, with our twins it is arbitrary whether we treat the data from the first twin as being anxiety measure 1 or anxiety measure 2). In this case, the only systematic source of variation is the random

382

variable representing the different objects. As such, we can use a one-way ANOVA of the form:

xij = + ri + eij
In which ri is the effect of object i (known as the row effects), j is the measure being considered, and eij is an error term (the residual effects). The row and residual effects are random, independent and normally distributed. Because the effect of the measure is ignored, the resulting intraclass correlation is based on the overall effect of the objects being measured (the mean between-object variability MSRows) and the mean within-object variability (MSW). Both of these will be formally defined later. Two-Way Random Effects Model When the order of measures is important then the effect of the measures becomes important. The most common case of this is when measures come from different judges or raters. Hodgins and Makarchuk (Hodgins & Makarchuk, 2003), for example, show two such uses; in their study they took multiple measures of the same class of behaviour (gambling) but also took measures from different sources. They measured gambling both in terms of days spent gambling and money spent gambling. Clearly these measures generate different data so it is important to which measure a datum belongs (it is not arbitrary to which measure a datum is assigned). This is one scenario in which a two-way model is used. However, they also took measures of gambling both from the gambler and a collateral (e.g. spouse). Again, it is important that we attribute data to the correct source. So, this is a second illustration of where a two-way model is useful. In such

383

situations the intraclass correlation can be used to check the consistency or agreement between measures or raters. In this situation a two-way model can be used as follows:

xij = + ri + c j + rcij + eij


In which cj is the effect of the measure (i.e. the effect of different raters, or different measures), and rcij is the interaction between the measures taken and the objects being measured. The effect of the measure (cj) can be treated as either a fixed-effect or a random-effect. How it is treated doesnt affect the calculation of the intraclass correlation, but it does affect the interpretation (as we shall see). It is also possible to exclude the interaction term and use the model:

x ij = + ri + c j + eij
We shall now turn our attention to calculating the sources of variance needed to calculate the intraclass correlation.
Sources of Variance: An Example

In the chapter in the book on repeated measures ANOVA, there is an example relating to student concerns about the consistency of marking between lecturers. It is common that lecturers obtain reputations for being hard or light markers which can lead students to believe that their marks are not based solely on the intrinsic merit of the work, but can be influenced by who marked the work. To test this we could calculate an intraclass correlation. First, we could submit the same 8 essays to four different lecturers and record the mark they gave each essay. Table 1 shows the data, and you should note that it looks

384

the same as a one-way repeated measures ANOVA in which the four lecturers represent 4 levels of an independent variable and the outcome or dependent variable is the mark given (in fact I use these data as an example of a one-way repeated measures ANOVA). Table 1
Dr. Essay Dr. Field Smith Scrote Dr. Dr. Death Mean S2 S2(k-1)

1 2 3 4 5 6 7 8
Mean:

62 63 65 68 69 71 78 75
68.88

58 60 61 64 65 67 66 73
64.25

63 68 72 58 54 65 67 75
65.25

64 65 65 61 59 50 50 45
57.38

61.75 64.00 65.75 62.75 61.75 63.25 65.25 67.00


63.94

6.92 11.33 20.92 18.25 43.58 84.25 132.92 216.00


Total:

20.75 34.00 62.75 54.75 130.75 252.75 398.75 648.00


1602.50

There are three different sources of variance that are needed to calculate an intraclass correlation which we shall now calculate. These sources of variance are the same as those calculated in one-way repeated measures ANOVA. (If you dont believe me consult Smart Alexs answers to chapter 13 to see an identical set of calculations!)

385

The Between-Object Variance (MSRows) The first source of variance is the variance between the objects being rated (in this case the between-essay variance). Essays will naturally vary in their quality for all sorts of reasons (the natural ability of the author, the time spent writing the essay etc.). This variance is calculated by looking at the average mark for each essay and seeing how much it deviates from the average mark for all essays. These deviations are squared because some will be positive and others negative and so would cancel out when summed. The squared errors for each essay are weighted by the number of values that contribute to the mean (in this case the number of different markers, k). So, in general terms we write this as:

SS Rows = k i (X Row i X all rows )


n i =1

Or, for our example we could write it as:

SS Essays = k i (X Essay i X all essays )


n i =1

This would give us: SSRows = 4(61.75 63.94)2 + 4(6400 63.94)2 + 4(65.75 63.94)2 + 4(62.75 63.94)2 + K + 4(61.75 63.94)2 + 4(63.25 63.94)2 + 4(65.25 63.94)2 + 4(67.00 63.94)2 = 19.18 + 0.014 + 13.10 + 5.66 + 19.18 + 1.90 + 6.86 + 37.45 = 103.34 This sum of squares is based on the total variability and so its size depends on how many objects (essays in this case) have been rated. Therefore, we convert this total to an average known as the mean squared error (MS) by dividing by the number of essays (or

386

in general terms the number of rows) minus 1. This value is known as the degrees of
freedom.

MS Rows =

SS Rows 103.34 103.34 = = = 14.76 df Rows n 1 7

The mean squared error for the rows in the table is our estimate of the natural variability between the objects being rated. The Within-Judge Variability (MSW) The second variability in which were interested is the variability within measures/judges. To calculate this we look at the deviation of each judge from the average of all judges on a particular essay. We use an equation with the same structure as before, but for each essay separately:

SS Essay = (X Column k X all columns )


p k =1

For essay 1, for example, this would be: SS Essay = (62 61.75) + (58 61.75) + (63 61.75) + (64 61.75) = 20.75
2 2 2 2

The degrees of freedom for this calculation is again one less than the number of scores used in the calculation. In other words it is the number of judges, k, minus 1. We have to calculate this for each of the essays in turn and then add these values up to get the total variability within judges. An alternative way to do this is to use the variance within each essay. The equation mentioned above is equivalent to the variance for each essay multiplied by the number of values on which that variance is based (in this case the number of judges, k) minus 1. As such we get:
387

2 2 2 2 SSW = s essay 1 (k1 1) + s essay 2 (k 2 1) + s essay 3 (k 3 1) + K + s essayn (k n 1)

Table 1 shows the values for each essay in the last column. When we sum these values

we get 1602.50. As before, this value is a total and so depends on the number essays (and the number of judges). Therefore, we convert it to an average, by dividing by the degrees
of freedom. For each essay we calculated a sum of squares that we saw was based on k1

degrees of freedom. Therefore, the degrees of freedom for the total within-judge variability are the sum of the degrees of freedom for each essay: df W = n(k 1) In which n is the number of essays and k is the number of judges. In this case it will be 8(41) = 24. The resulting mean squared error is, therefore:

MSW =

SSW 1602.50 1602.50 = = = 66.77 df W n(k 1) 24

The Between-Judge Variability (MSColumns) The within-judge or within-measure variability is made up of two components. The first is the variability created by differences between judges. The second is unexplained variability (error for want of a better word). The variability between judges is again calculated using a variant of the same equation that weve used all along only this time were interested in the deviation of each judges mean from the mean of all judges:

SS Columns = ni (X Column i X all columns )


p k =1

Or:
388

SS Judges = ni (X Judge i X all Judges )


p k =1

In which n is the number of things that each judge rated. For these data wed get:

SS Columns = 8(68.88 63.94) 2 + 8 (64.25 63.94) 2 + 8 (65.25 63.94) 2 + 8 (57.38 63.94) 2 = 554
The degrees of freedom for this effect are the number of judges, k, minus 1. As before, the sum of squares is converted to a mean squared error by dividing by the degrees of freedom:

MS Columns =

SS Columns 554 554 = = = 184.67 df Columns k 1 3

The Error Variability (MSE) The final variability is the variability that cant be explained by known factors such as variability between essays or judges/measures. This can be easily calculated using subtraction because we know that the within-judges variability is made up of the betweenjudges variability and this error: SS W = SS Columns + SS E SS E = SS W SS Columns The same is true of the degrees of freedom: df W = df Columns + df E df E = df W df Columns So, for these data we get:

389

SS E = SS W SS Columns = 1602.50 554 = 1048.50 and: df E = df W df Columns = 24 3 = 21 We get the average error variance in the usual way:

MS E =

SS E 1048.50 = = 49.93 df E 21

Calculating Intraclass Correlations

Having computed the necessary variance components, we shall now look at how the intraclass correlation is calculated. Before we do so, however, there are two important decisions to be made. Single Measures of Average Measures So far we have talked about situations in which the measures weve used produce single values. However, it is possible that we might have measures that produce an average score. For example, we might get judges to rate paintings in a competition based on style, content, originality, and technical skill. For each judge, their ratings are averaged. The end result is still ratings from a set of judges, but these ratings are an average of many ratings. Intraclass correlations can be computed for such data, but the computation is somewhat different. Consistency or Agreement?

390

The next decision involves whether you want a measure of overall consistency between measures/judges. The best way to explain this distinction is to return to our lecturers marking essays. It is possible that particular lecturers are harsh in their ratings (or lenient). A consistency definition views these differences as an irrelevant source of variance. As such the between-judge variability described above (MSColumns) is ignored in the calculation (see Table 2). In ignoring this source of variance we are getting a measure of whether judges agree about the relative merits of the essays without worrying about whether the judges anchor their marks around the same point. So, if all the judges agree that essay 1 is the best, essay 5 is the worst (or their rank order of essays is roughly the same) then agreement will be high: it doesnt matter that Dr. Fields marks are all 10% higher than Dr. Deaths. This is a consistency definition of agreement. The alternative is to treat relative differences between judges as an important source of disagreement. That is, the between-judge variability described above (MSColumns) is treated as an important source of variation and is included in the calculation (see Table 2). In this scenario disagreements between the relative magnitude of judges ratings matters (so, the fact that Dr. Deaths marks differ from Dr. Fields will matter even if their rank order of marks is in agreement). This is an absolute agreement definition. By definition the one-way model ignores the effect of the measures and so can have only this kind of interpretation. Equations for ICCs Table 2 shows the equations for calculating ICC based on whether a one-way or two-way model is assumed and whether a consistency or absolute agreement definition is preferred. For illustrative purposes, the ICC is calculated in each case for the example
391

used in this entry. This should enable the reader to identify how to calculate the various sources of variance. In this table MSColumns is abbreviated to MSC and MSRows is abbreviated to MSR.
Table 2:

ICC for Single Scores Model Interpretation Oneway Absolute Agreement Consistency TwoWay Absolute Agreement Equation
MS R MS W MS R + (k 1)MS W

ICC for example data


14.76 66.77 = 0.24 14.76 + (4 1)66.77

MS R MS E MS R + (k 1)MS E
MS R MS E k MS R + (k 1)MS E + (MS C MS E ) n

14.76 49.93 = 0.21 14.76 + (4 1)49.93

14.76 49.93 = 0.15 4 14.76 + (4 1)49.93 + (184.67 49.93) 8

ICC for Average Scores Oneway Absolute Agreement Consistency TwoWay Absolute Agreement
MS R MS W MS R 14.76 66.77 = 3.52 14.76

MS R MS E MS R

14.76 49.93 = 2.38 14.76 14.76 49.93 = 1.11 184.67 49.93 14.76 + 8

MS R MS E MSC MS E MS R + n

392

Significance Testing

The calculated intraclass correlation can be tested against a value under the null hypothesis using a standard F-test (see analysis of variance). McGraw and Wong (McGraw & Wong, 1996) describe these tests for the various intraclass correlations weve seen and Table 3 summarises their work. In this table ICC is the observed intraclass correlation whereas 0 is the value of the intraclass correlation under the null hypothesis. That is, its the value against which you wish to compare the observed intraclass correlation. So, replace this value with 0 to test the hypothesis that the observed ICC is greater than zero, but replace it with other values such as 0.1, 0.3 or 0.5 to test that the observed ICC is greater than know values of small medium and large effect sizes respectively.
Table 3:

ICC for Single Scores Model Interpretation Oneway Absolute Agreement Consistency TwoWay Absolute Agreement
MS R aMSC + bMS E n1

F-ratio
1 0 MS R MS W 1 + (k 1) 0

Df1

Df2

n1

n(k 1)

1 0 MS R MS E 1 + (k 1) 0

n1

(n 1)(k 1)

(aMSC + bMS E )2 (aMSC )2 (bMS E )2


k1 +

In which;

(n 1)(k 1)

393

k 0 n(1 0 ) k (n 1) b = 1+ 0 n(1 0 ) a=

ICC for Average Scores Oneway Absolute Agreement Consistency


1 0 1 ICC

n1

n(k 1)

1 0 1 ICC

n1

(n 1)(k 1)

TwoWay Absolute Agreement

MS R cMSC + dMS E

In which;
n1

(cMSC + dMS E )2 (cMSC )2 (dMS E )2


k1 +

c=

n(1 0 ) (n 1) b = 1+ 0 n(1 0 )

(n 1)(k 1)

Fixed versus Random Effects

I mentioned earlier on that the effect of the measure/judges can be conceptualised as a fixed or random effect. Although it makes no difference to the calculation it does affect the interpretation. Essentially, this variable should be regarded as random when the judges or measures represent a sample of a larger population of measures or judges that could have been used. Put another way, the particular judges or measures chosen are not important and do not change the research question youre addressing. However, the effect of measures should be treated as fixed when changing one of the judges or measures

394

would significantly affect the research question (see fixed and random effects). For example, in the gambling study mentioned earlier it would make a difference if the ratings of the gambler were replaced: the fact the gamblers gave ratings was intrinsic to the research question being addressed (do gamblers give accurate information about their gambling?). However, in our example of lecturers marks, it shouldnt make any difference if we substitute one lecturer with a different one: we can still answer the same research question (do lecturers, in general, give inconsistent marks?). In terms of interpretation, when the effect of the measures is a random factor then the results can be generalized beyond the sample; however, when they are a fixed effect, any conclusions apply only to the sample on which the ICC is based (McGraw & Wong, 1996).
Oliver Twisted: Please Sir, Can I Have Some More Centring?

Recentgin, babbles Oliver as he stumbles drunk out of Mrs Moonshines Alcohol Emporium, I need some more recent gin. I think you mean centring Oliver, not recentgin. If you want to know how to centre your variables using SPSS, then the additional material for this chapter on the companion website will tell you. Well use the Cosmetic Surgery.sav data to illustrate the two types of centring discussed in the book chapter. Load this file into SPSS. Lets assume that we want to centre the variable BDI.
Grand Mean Centring

395

Grand mean centring is really easy time we can simply use the compute command that we encountered in the book. First, we need to find out the mean score for BDI. We can do this using some simple descriptive statistics. Chose

to access the dialog box below. Select BDI and drag it to the box labelled Variable(s), then click on dont need any other information). and select only the mean (we

The resulting output tells us that the mean is 23.05:

We use this value to centre the variable. Access the compute command by selecting . In the resulting dialog box, Enter the name BDI_Centred into the box labelled Target Variable and then click on and give the variable a more

descriptive name if you want to. Select the variable BDI and drag it across to the area

396

labelled Numeric Expression, then click on (23.05). The completed dialog box is below:

and then type the value of the mean

Click on

and a new variable will be created called BDI_Centred which is centred

around the mean of BDI. The mean of this new variable should be approximately 0: run some descriptive statistics to see that this is true. You can do the same thing in a syntax window by typing:

COMPUTE BDI_Centred = BDI-23.05.

397

EXECUTE.

Group Mean Centring

Group mean centring is considerably more complicated. The first step is to create a file containing the means of the groups. Lets try this again for the BDI scores. We want to centre this variable across the level 2 variable of Clinic. We first need to know the mean BDI in each group and to save that information in a form that SPSS can use later on. To do this we need to use the aggregate command, which is not discussed in the book. To access the main dialog box select . In this dialog box we want to select Clinic

and drag it to the area labelled Break variable. This will mean that the variable clinic is used to split up the data file (in other words, when the mean is computed it will do it for each clinic separately). We then need to select BDI and drag it to the area labelled Summaries of variable(s). Youll notice that once this variable is selected the default is that SPSS will create a new variable called BDI_mean, which is the mean of BDI (split by clinic, obviously). We need to save this information in a file that we can access later on, so select . By default, SPSS will save

the file with the name aggr.sav in your default directory. If you would like to save it elsewhere or under a different name then click on to open a normal file system

dialog box where you can name the file and navigate to a directory that youd like to save it in. Click on to create this new file.

398

If you open the resulting data file (you dont need to, but it will give you an idea of what it contains) you will see that it simply contains two columns, one with a number specifying the clinic from which the data came (there were 10 clinics) and the second containing the mean BDI score within each clinic.

When SPSS creates the aggregated data file it orders the clinics from lowest to highest (regardless of what order they are in the data set). Therefore, to make our working data file match this aggregated file, we need to make sure that all of the data from the various clinics are ordered too from clinic 1 up to clinic 10. This is easily done by using the sort

399

cases command. (Actually our data are already ordered in this way, but because your data might not always be, well go through the motions anyway.) To access the Sort cases command select . Select the variable that you want to sort the file by (in this ). You can choose to

case Clinic) and drag it to the area labelled Sort by (or click on

order the file in ascending order (clinic 1 to clinic 10), which is what we need to do here, or descending order (clinic 10 to clinic 1). Click on to sort the file.

The next step is to use these clinic means in the aggregated file to centre the BDI variable in our main file. To do this we need to use the match files command, which can be accessed by selecting . This will open a dialog box

that lists all of open data files (in my case I had none open apart from the one that I was working from, so this space is blank) or asks you to select an SPSS data file. Click on and navigate to wherever you decided to store the file of aggregated values (in my case aggr.sav). Select this file, then click to move on to the next dialog box. to return to the dialog box. Then click on

400

In the next dialog box we need to match the two files, which just tells SPSS that the two files are connected. To do this click on . Then we also need

to specifically connect the files on the Clinic variable. To do this select , which tells SPSS that the data set that isnt active (i.e. the file of aggregated scores) should be treated as a table of values that are matched to the working data file on a key variable. We need to select what this key variable is. We want to match the files on the Clinic variable, so select this variable in the Excluded variables list and drag it to the space labelled Key Variables (or click on ). Click on .

401

The data editor should now include a new variable, BDI_Mean, which contains the values from our file aggr.sav. Basically, SPSS has matched the files for the clinic variable, so that the values in BDI_Mean correspond to the mean value for the various clinics. So, when the clinic variable is 1, BDI_mean has been set as 25.19, but when clinic is 2, BDI_Mean is set to 31.32. We can use these values in the compute command again to centre BDI. Access the compute command by selecting . In the

resulting dialog box enter the name BDI_Group_Centred into the box labelled Target Variable and then click on and give the variable a more descriptive name if you

want to. Select the variable BDI and drag it across to the area labelled Numeric Expression, then click on and then either type BDI_Mean or select this variable and and a new variable will be

drag it to the box labelled Target Variable. Click on created containing the group centred means.

402

Alternatively you can do this all with the following syntax:

AGGREGATE /OUTFILE='C:\Users\Dr. Andy Field\Documents\Academic\Data\aggr.sav' /BREAK=Clinic /BDI_mean=MEAN(BDI). SORT CASES BY Clinic(A). MATCH FILES /FILE=* /TABLE='C:\Users\Dr. Andy Field\Documents\Academic\Data\aggr.sav' /BY Clinic. EXECUTE. COMPUTE BDI_Group_Centred=BDI - BDI_mean. EXECUTE.

Labcoat Lenis Real Research: A Fertile Gesture

Miller, Tybur & Jordan (2007). Evolution and Human Behavior, 28, 375381.

403

Most female mammals experience a phase of estrus during which they are more sexually receptive, proceptive, selective and attractive. As such, the evolutionary benefit to this phase is believed to be to attract mates of superior genetic stock. However, some people have argued that this important phase became uniquely lost or hidden in human females. Testing these evolutionary ideas is exceptionally difficult but Geoffrey Miller and his colleagues came up with an incredibly elegant piece of research that did just that. They reasoned that if the hidden-estrus theory is incorrect then men should find women most attractive during the fertile phase of their menstrual cycle compared to the prefertile (menstrual) and post-fertile (luteal) phase. To measure how attractive men found women in an ecologically valid way, they came up with the ingenious idea of collecting data from women working at lapdancing clubs. These women maximize their tips from male visitors by attracting more dances. In effect the men try out several dancers before choosing a dancer for a prolonged dance. For each dance the male pays a tip, therefore the more men that chose a particular woman, the more her earnings will be. As such, each dancers earnings are a good index of how attractive the male customers have found her. Miller and his colleagues argued, therefore, that if women do have an estrus phase then they will be more attractive during this phase and therefore earn more money. This study is a brilliant example of using a realworld phenomenon to address an important scientific question in an ecologically valid way. The data for this study are in the file Miller et al. (2007).sav. The researchers collected data via a website from several dancers (ID), who provided data for multiple lapdancing shifts (so for each person there are several rows of data). They also measured what phase

404

of their menstrual cycle the women were in at a given shift (Cyclephase), and whether they were using hormonal contraceptives (Contraceptive) because this would affect their cycle. The outcome was their earnings on a given shift in dollars (Tips). A multilevel model can be used here because the data are unbalanced: each woman differed in the number of shifts they provided data for (the range was 9 to 29 shifts), and there were missing data for Cyclephase. Multilevel models can handle these problems with ease. Labcoat Leni wants you to carry out a multilevel model with to see whether Tips can be predicted from Cyclephase, Contraceptive and their interaction. Is the estrus-hidden hypothesis supported? Answers are in the additional material on the companion website (or look at page 378 in the original article).

First, select

; in this initial dialog box we need to set

up the level 2 variable. In this example, multiple scores or shifts are nested within each dancer. Therefore, the level 2 variable is the participant (the lap dancer) and this variable is represented by the variable labelled ID. Select this variable and drag it to the box labelled Subjects (or click on ). Click on to access the main dialog box.

405

In the main dialog box we need to set up our predictors and outcome. The outcome was the value of tips earned, so select Tips and drag it to the box labelled Dependent variable (or click on ). We also have two predictors: Cyclephase and Contraceptive. Select

both of these (click on one and then while holding down Ctrl click on the other) and then drag them to the box labelled Factor(s), or click on both variables are categorical. . We use the Factor(s) box because

406

We need to add these fixed effects to our model, so click on

to bring up the fixed

effects dialog box. To specify both main effects and the interaction term, select both predictors (click on Cyclephase and then while holding down Ctrl click on
Contraceptive), then select

, and then click on

. With

selected

you should find that both main effects and the interaction term are transferred to the Model. Click on to return to the main dialog box.

In the model that Miller et al. fitted, they did not assume that there would be random slopes (i.e. the relationship between each predictor and tips was not assumed to vary within lap dancers). This decision is appropriate for Contraceptive because this variable didnt vary at level 2 (the lap dancer was either taking contraceptives or not, so this could not be set up as a random effect because it doesnt vary over our level 2 variable of participant). Also, because Cyclephase is a categorical variable with three unordered categories we could not expect a linear relationship with tips: we expect tips to vary over
407

categories but the categories themselves have no meaningful order. However, we might expect tips to vary over participants (some lap dancers will naturally get more money than others) and we can factor this variability in by allowing the intercept to be random. As such, were fitting a random intercept model to the data. To do this click on in the main dialog box to access the dialog box below. The

first thing we need to do is to specify our contextual variable. We do this by selecting it from the list of contextual variables that we have told SPSS about already. These appear in the section labelled Subjects and because we only specified one variable, there is only one variable in the list, ID. Select this variable and drag it to the area labelled Combinations (or click on we do this by selecting ). We want to specify only that the intercept is random, and . Notice in this dialog box that there is a dropdown list ). For a random intercept model this to return to the main dialog box.

to specify the type of covariance ( default option is fine. Click on

408

The authors report in the paper that they used restricted maximumlikelihood estimation (REML), so click on and select this option. Finally, click on and select to return to the

Parameter estimates and Tests for covariance parameters. Click on main dialog box. To run the analysis, click on .

This first table tells us our fixed effects. As you can see they are all significant. Miller and colleagues reported these results as follows:

409

Main effects of cycle phase [F(2, 236)=27.46, p < .001] and contraception use [F(1, 17)=6.76, p < .05] were moderated by an interaction between cycle phase and pill use [F(2, 236)=5.32, p < .01]. (p. 378) Hopefully you can see where these values come from in the table (they rounded the df off to whole numbers). Basically this shows that the phase of the dancers cycle significantly predicted tip income and this interacted with whether or not the dancer was having natural cycles or was on the contraceptive pill. However, we dont know which groups differed. We can use the parameter estimates to tell us:

I coded Cyclephase in a way that would be most useful for interpretation, which was to code the group of interest (fertile period) as the last category (2), and the other phases as 1 (Luteal) and 0 (Menstrual). The parameter estimates for this variable, therefore, compare each category against the last category, and because I made the last category the fertile phase this means we get a comparison of the fertile phase against the other two. Therefore, we could say (because the b is negative) that tips were significantly higher in

410

the fertile phase than in the menstrual phase, b = 100.41, t(235.21) = 6.11, p < .001, and in the luteal phase, b = 170, t(234.92) = 9.84, p < .001. The beta, as in regression, tells us the change in tips as we shift from one group to another, so during the fertile phase, dancers earned about $100 more than during the menstrual phase, and $170 more than the luteal phase. These effects dont factor in the contraceptive use. To look at this we need to look at the contrasts for the interaction term. The first of these tells us the following: if we worked out the relative difference in tips between the fertile phase and the menstrual phase, how much more do those in their natural cycle earn compared to those on contraceptive pills? The answer is about $86. In other words, there is a combined effect of being in a natural cycle and being in the fertile phase and this is significant, b = 86.09, t(237) = 2.86, p < .01. The second contrast tells us the following: if we worked out the relative difference in tips between the fertile phase and the luteal phase, how much more do those in their natural cycle earn compared to those on contraceptive pills? The answer is about $90 (the b). In other words, there is a combined effect of being in a natural cycle and being in the fertile phase compared to the luteal phase and this is significant, b = 89.94, t(236.80) = 2.63, p < .01.

411

The final table is not central to the hypotheses, but it does tell us about the random intercept. In other words, it tells us whether tips (in general) varied from dancer to dancer. The variance in tips across dancers was 3571.12, and this is significant, z = 2.37, p < .05. In other words, the average tip per dancer varied significantly. This confirms that we were justified in treating the intercept as a random variable. To conclude then, this study showed that the estrus-hidden hypothesis is wrong: men did find women more attractive (as indexed by how many lap dances they did and therefore how much they earned) during the fertile phase of their cycle compared to the other phases.

References

Board, B. J., & Fritzon, K. (2005). Disordered personalities at work. Psychology, Crime & Law, 11(1), 1732. etinkaya, H., & Domjan, M. (2006). Sexual fetishism in a quail (Coturnix japonica) model system: Test of reproductive success. Journal of Comparative Psychology, 120(4), 427432. Chamorro-Premuzic, T., Furnham, A., Christopher, A. N., Garwood, J., & Martin, N. (2008). Birds of a feather: Students preferences for lecturers personalities as predicted by their own personality and learning approaches. Personality and Individual Differences, 44, 965976.

412

Davidson, M.L.(1972). Univariate versus multivariate tests in repeated-measures experiments. Psychological Bulletiu, 77, 446452. Davey, G. C. L., Startup, H. M., Zara, A., MacDonald, C. B., & Field, A. P. (2003). Perseveration of checking thoughts and mood-as-input hypothesis. Journal of Behavior Therapy & Experimental Psychiatry, 34, 141160. Eley, T. C., & Stevenson, J. (1999). Using genetic analyses to clarify the distinction between depressive and anxious symptoms in children. Journal of Abnormal Child Psychology, 27(2), 105114. Fesmire, F. M. (1988). termination of intractable hiccups with digital rectal massage. Annals of Emergency Medicine, 17(8), 872872. Field, A. P. (2005). Intraclass correlation. In B. Everitt & D. C. Howell (Eds.), Encyclopedia of Behavioral Statistics (Vol. 2, pp. 948954). New York: Wiley. Field, A. P. (2006). The behavioral inhibition system and the verbal information pathway to children's fears. Journal of Abnormal Psychology, 115(4), 742752. Gallup, G. G. J., Burch, R. L., Zappieri, M. L., Parvez, R., Stockwell, M., & Davis, J. A. (2003). The human penis as a semen displacement device. Evolution and Human Behavior, 277289. Gaskell, G. D., Wright, D. B., & OMuircheartaigh, C. A. (1993). Reliability of surveys. The Psychologist, 6 (11), 500503. Hodgins, D. C., & Makarchuk, K. (2003). Trusting problem gamblers: Reliability and validity of self- reported gambling behavior. Psychology of Addictive Behaviors, 17(3), 244248. 24,

413

Lacourse, E., Claes, M., & Villeneuve, M. (2001). Heavy metal music and adolescent suicidal risk. Journal of Youth and Adolescence, 30(3), 321332. Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov tests for normality with mean and variance unknown. Journal of the American Statistical Association, 62, 399402. Nichols, L. A., & Nicki, R. (2004). Development of a psychometrically sound internet addiction scale: A preliminary step. Psychology of Addictive Behaviors, 18(4), 381384. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 3046. Massey, F. J. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 46, 6878. Mathews, R.C., Domjan, M.Ramsay, M. and crews, (2007). Learning effects on Sperm competition and reproductive F. tress. Psychological Science, 18(9), 758762. Marzillier, S.L. and Davey,g.c.L.(2005). Anxiety and disgust: Evidence unidirectional. Cognition and Emotion, 19(5), 729750. Muris, P.Huijding, J.Mayer, B. and Hameetman, M.(2008). A space odyssey: Experimental manipulation of threat perception and anxietyrelatd interpretation bias in children. Child Psychiatry and Human Development, 39(4), 469480. Schtzwohl, A. (2008). The disengagement of attentive resources from task-irrelevant cues to sexual and emotional infidelity. Personality and Individual Differences, 44, 633-644. Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52, 591611.

414

Shapiro, S. S., Wilk, M. B., & Chen, H. J. (1968). A comparative study of various tests for normality. Journal of the American Statistical Association, 63, 13431372. Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing reliability. Psychological Bulletin, 86, 420428. Stack, S., & Gundlach, J. (1992). The effect of country music on suicide. Social Forces, 71, 211218.

An analysis of the untransformed scores using a non-parametric test (Friedmans ANOVA) also revealed significant differences between approach times to the boxes, 2(2) = 140.36, p < .001.

415