Вы находитесь на странице: 1из 18

1

Serial Processing is Consistent with the Time Course of Linguistic Information

Extraction from Consecutive Words during Eye Fixations in Reading:

A Reply to Inhoff, Eiter, and Radach (2005)

Alexander Pollatsek

University of Massachusetts, Amherst

Erik D. Reichle

University of Pittsburgh

and

Keith Rayner

University of Massachusetts, Amherst

Correspondence To:

Alexander Pollatsek
Department of Psychology
University of Massachusetts
Amherst, MA 01003
pollatsek@psych.umass.edu

Running Head: Timing and eye fixations


Abstract

Inhoff, Eiter, and Radach (2005) reported the results of two experiments which they

claimed were problematic for serial attention models of eye movements in reading (such as the

E-Z Reader model). In this reply, we demonstrate via argumentation and simulations that their

data pose no serious problem for the E-Z Reader model or serial attention models in general.

2
Inhoff, Eiter, and Radach (2005) presented data which they claim is inconsistent both

with the E-Z Reader model (Pollatsek, Reichle, & Rayner, 2005; Rayner, Ashby, Pollatsek, &

Reichle, 2004; Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Pollatsek, & Rayner, 1999,

2003, 2006) and, more generally, with any model that assumes that attention moves serially from

word to word (i.e., in which lexical processing of word n, the attended word, must complete

before lexical processing of word n+1, the next word in the text, begins). We disagree with their

claim; however, before getting into the details of their findings, we wish to make clear in what

sense the E-Z Reader model posits serial processing and thus illuminate what the controversy is

about.

First, although the E-Z Reader model posits that lexical processing of words is serial, it

assumes that lower-level (pre-attentive) visual processing is in parallel across the retina. This

parallel processing assumption is necessary because, among other things, if such processing did

not occur, a reader would not know how to target a saccade to the next word. Second, although

the model posits that attention moves serially from word to word, it does not imply that only one

word can be processed on a fixation in reading. In fact, the model predicts that the usual case in

reading is that lexical processing of the fixated word is completed and processing of the next

word in the text begins during the same fixation. On some occasions, on a single fixation,

processing of the next word (word n+1) can be completed which allows processing the word to

the right of that one (word n+2) to begin. This state of affairs is possible because the attended

word is not necessarily the fixated word, as we assume (consistent with the large literature on

covert attention) that covert attentional shifts can occur during a fixation. Third, as will become

important later, we should note that in the E-Z Reader model, the independence of covert

3
attention and fixation locations is extended one step further: the act of programming a saccade

from word n to word n+1 is decoupled from the act of shifting attention from word n to word

n+1. That is, it is assumed that (a) completion of a preliminary stage of lexical processing (L1) is

the signal that initiates an eye-movement program that is executed after approximately 125 ms

(in the latest versions of the model; Pollatsek et al., 2005) but (b) that completion of the second

stage (L2), which is the completion of lexical access, is the signal to shift attention. (In the

model, the shift of attention is assumed to be instantaneous.) To keep things simple, we will

assume in the discussion below that the fixated word is the word attended to when the fixation

begins; this is not always the case in the model (e.g., due to mistargeting of saccades), but is on a

large majority of fixations.

All of these assumptions allow the model to produce a reading process in which more

than one word is usually processed on a fixation, and hence lexical processing of a word usually

starts before it is fixated. Let us now go through some of the details to give a better picture of

the timing of these processes. After L1 of the fixated word (word n) is completed, there will be

125 ms (i.e., the saccadic latency) before a saccade to the next word (word n+1) is executed.

Some part of the 125 ms will be occupied by the completion of lexical processing of word n.

However, if the second stage takes less than 125 ms (in most versions of the model, L2 requires

something like 50-75 ms, depending on factors such as the frequency of word n), then there will

an interval of time at least equal to the saccade programming time minus the duration of L2 for

the lexical processing of word n+1 before it is actually fixated1. (In the most recent version of

the model, we assume this lexical processing of the parafoveal information continues during the

saccade as well.)

4
The experiments reported by Inhoff et al. were undertaken to assess whether the

assumptions of a serial-attention-shift (SAS) model such as the E-Z Reader model can adequately

account for the time course of the extraction of information from word n+1 using a novel

display-change technique in which the letter information of word n+1 is replaced by a mask (i.e.,

essentially a random letter string that is the same length as word n+1) either for a portion of the

time at the end of the fixation on word n or a portion of the time at the beginning of the fixation

on word n. We take the two key findings to be the following. In Experiment 1, the mask

appeared for varying amounts of time and then was replaced by word n+1. The key pattern of

data that was thought to be incompatible with SAS models and the E-Z Reader model is: (a) the

gaze duration (i.e., the sum of all fixations of a word during the first pass through the text) on

word n+1 was virtually unaffected if the display onset of word n+1 was delayed by 70 ms or less

from when word n was initially fixated, and (b) increasing the delay of the text beyond 70 ms

increased the gaze duration on word n+1. The design and data in Experiment 2 were more

complex. There, a single display-change time was chosen (140 ms) and there were essentially

four viewing conditions: (1) full preview – word n+1 was always visible; (2) delayed

information – upon fixating word n, a mask was displayed for 140 ms in the location of word

n+1, which was subsequently replaced by word n+1; (3) aborted information – upon fixating

word n, word n+1 appeared for 140 ms, followed by the mask; (d) no preview – the mask was

continually displayed until word n+1 was fixated2. If one considers the no preview condition to

be a baseline for assessing the utility of extracting information about word n+1 from the

parafovea, they found a benefit of 99 ms for the full preview condition, a benefit of 40 ms in the

aborted information condition, and a benefit of only 24 ms in the delayed information condition.

5
Before going on to discuss the implication of these results for the E-Z Reader model (or

any model that assumes serial attention shifts), we need to stress one important aspect of these

experiments: The text was presented in alternating case (sO tHaT iT lOoKeD lIkE tHiS), which

presumably made reading significantly more difficult than normal. The gaze durations (even in

the full preview condition) were fairly long (over 300 ms), and the size of the preview benefit in

the full preview condition (99 ms) is quite a bit greater than is usually observed with normal text,

where the benefit from a full preview over a no preview condition is usually about 40 ms (see

Hyönä, Bertram, & Pollatsek, 2004, for a review of the findings on preview benefit). This

suggests, among other things, that model parameter estimates that are obtained from experiments

involving the reading of normal text will not be appropriate for explaining these data.

The results of Experiment 1 pose no problem in general for SAS models. Inhoff et al.’s

first finding, that having a mask (instead of word n+1) displayed during the first 70 ms of the

fixation on word n has virtually no effect on reading, is consistent with any SAS model in which

attention switches late enough such that the mask presented in word position n+1 during the first

70 ms has virtually always been replaced by the text by then. More specifically, for the E-Z

Reader model, this point is when the visual information for the delayed word (word n+1)

becomes available for lexical processing (which would be about 120-150 ms after the beginning

of the fixation on word n in the 70 ms delay condition, see below). Their second finding –

increasing the delay interferes with reading – is likewise consistent with any SAS model in

which attention starts to switch from word n to word n+1 at about 120-150 ms after the

beginning of the fixation on word n. Of course, this raises the question whether this pattern of

data is quantitatively consistent with the E-Z Reader model. We think that it is, but before we

6
attempt to model the data we will first try to sketch an argument for why.

In our simulations of normal reading, we assume that the visual processing stage (V)

occupies the first 50 ms of a fixation, and that L1 and L2 take something like 100 ms and 50 ms,

respectively, for a typical word3. (Conceptually, the 50 ms duration of the V stage corresponds to

the eye-to-mind delay; see Pollatsek et al., 2005.) This means that, if there was no benefit from

the preview of word n before it was fixated, the attention shift to word n+1 would occur

something like 200 ms after the beginning of the fixation (50 ms for V + 100 ms for L1 + 50 ms

for L2), and that the saccade to word n+1 would be executed something like 275 ms later (50 ms

for V + 100 ms for L1 + 125 ms for saccadic programming). However, if word n has a normal

preview (which is the case in Inhoff et al.’s experiments), the visual processing and some of the

lexical processing will be done on word n using parafoveal information. This would typically

subtract 40-50 ms from the times posited above, and thus mean that E-Z Reader would predict

that an attention shift to word n+1 typically occurs about 150-160 ms after the beginning of the

fixation on word n, and that the typical fixation duration on a word would be about 225-235 ms.

(Again, it is important to emphasize that that these estimates ignore several sources of variability

that would cause the range of times to be much larger; see Footnote 3.)

Are these values for the processes consistent with the pattern of data in Inhoff et al.? The

analysis in the above paragraph suggests that attention shifts to word n+1 should start to occur

150-160 ms after the beginning of a fixation. But now we have to consider how that would

relate to the display change conditions, as the E-Z Reader model posits that the effect of the

display change would not be immediate because the 50 ms eye-to-mind transmission time

applies to the arrival of this new information as well as to the arrival of the information at the

7
beginning of a fixation. Thus, for the 70 ms delay condition of Inhoff et al., for example, the

visual information that is needed for the lexical processing of word n+1 wouldn’t arrive until

120 ms after the fixation on word n+1 had began. More generally, for an X ms delay condition,

the information wouldn’t be available for lexical processing until X + 50 ms after the start of the

fixation. Thus, assuming the 150-160 ms estimate for the attention-shift latency in the above

paragraph, the E-Z Reader model would predict that one would start seeing some effects of the

delayed information for delays of 100 ms or longer, and that one would continue to see

increasingly interfering effects up to delays of 240 ms—consistent with the data. Although this

analysis is somewhat over-simplified as it is not taking into account random variability of

processes from trial to trial, it indicates that a simulation of the data from Experiment 1 should

not be difficult to achieve without making any changes for the unusual text conditions4.

The above discussion indicates that it is not problematic for SAS models, in general, to

explain the data from Experiment 1 of Inhoff et al. and indicates that the E-Z Reader model, in

particular, would not have trouble fitting their data quantitatively. Inhoff et al. appear to concede

the former point in their discussion of Experiment 1, but apparently deem that their data from

Experiment 2 is a stronger “disproof” of SAS models. Again, the key findings from Experiment

2 were: (a) there was only a 24 ms benefit from the word preview when it is delayed by 140 ms

(which replicated their data in Experiment 1), but (b) there was a 40 ms benefit when the word

preview information was “aborted” (i.e., replaced by a mask) after 140 ms. (Both of these values

are relative to a baseline condition when the mask was present for the entire preview fixation.)

Their reasoning appears to assume something like the following: (a) attention typically switches

over to word n+1 something like 150-160 ms after the beginning of the fixation, (b) the display

8
change occurs at something like the nominal 140 ms value, and (c) a typical fixation duration is

something like 225-235 ms. If so, then they seem to be arguing that in the delay condition, the

preview of word n+1 should be available after 150-160 ms and remain available until the end of

the fixation (or for about 65-85 ms). Note, however, that 150-160 ms would be the earliest

attention shifts and the mean value for how long the information would be available is less than

this. In the same spirit, we take them to be arguing that in the abortion condition, the information

is removed after 140 ms, so that the preview should not be available (or only available very

briefly for some of the trials due to the inherent variability in lexical processing times; see

Footnote 3). If so, one would expect greater preview benefit in the delay condition than in the

abortion condition. If this indeed is the argument, then it ignores the fact that the information

change caused by the display change does not have an impact on lexical processing until about

50 ms after the display change (because of the eye-to-mind lag), and that the visual information

on the fixation is usable for 50 ms after the beginning of the saccade (again, because of the eye-

to-mind lag). Thus, the rough calculation above would change quite drastically. In that case, the

predicted times that the information would be available in the delay condition would still be

about 65-85 ms, because the delay of the visual information due to the display change and the

delay caused by using visual information from the prior fixation would cancel each other.

However, the predicted times that the information would be available in the abortion condition

would increase to about 40-50 ms. (The display change would “register” 190 ms after fixating

word n, but attention would have switched to word n+1 after only 150-160 ms.) Moreover,

Inhoff et al. indicate that the display change typically occurred about 15 ms after the nominal

time, which would lengthen the estimated times that the preview is available in the abortion

9
condition to 55-70 ms.

Given the estimates from this oversimplified argument, it is not obvious that the E-Z

Reader model can predict better performance in the 140 ms abortion condition than in the 140 ms

delay condition and thus that the results of Experiment 2 are interesting. [I think we had it

backwards before.] However, we do not believe that they provide strong evidence against the

plausibility of SAS models such as E-Z Reader. In the remainder of this article, we will

demonstrate that the E-Z Reader model generates predictions that are in reasonable agreement

with the qualitative patterns of preview benefit that were observed by Inhoff et al. in the four

critical conditions of their second experiment. Before doing so, however, it is important to note

that both their paradigm and their data are complex, and hence likely to be difficult for any

model to account for fully. We (Pollatsek et al., 2005) have recently expanded our modeling

efforts to account for results from the boundary paradigm developed by Rayner (1975); however,

this is a simpler display change paradigm than the one used by Inhoff et al., as there was no

display change during a fixation. In these modeling efforts, we adopted the simplifying

assumption that a “no preview” condition (such as random letters replacing word n+1 in the

parafovea) merely amounts to withholding of the correct information until the word is fixated.

(That is, the presence of random letters is assumed to be a “neutral” mask.) This assumption

appears to work well for normal text (where the benefit of getting the word intact as a preview is

about 40 ms). We think our argument above, however, indicates that this assumption is likely to

be violated for mixed letter case conditions. As we already stated, the preview benefit in the full-

preview condition was 99 ms—much larger than the 40 ms preview benefits that are typically

observed (Hyönä et al., 2005). However, we acknowledge the need to look further to test

10
whether this neutrality assumption is also an oversimplification for normal text conditions as

well.

A second simplifying assumption in our modeling of these data is that the display change

itself has no effect beyond the change of information. That is, we have assumed that the visual

transients accompanying the display change do not have any interfering effects, such as drawing

attention to that region of text prematurely or inadvertently causing a saccade to be programmed

that may need to be cancelled. Such problems are less severe when attempting to model

paradigms in which the display change occurs during a saccade, as such changes during saccades

are rarely if ever consciously perceived, and there is no evidence that the transients associated

with such changes have any effect on the attentional or eye movement systems. In contrast, it is

likely that display changes during fixations are often perceived, so that there is a danger of

artifacts entering into the data.

If neither of these assumptions were violated, our modeling would predict that the sum of

(a) the benefit of having the preview present for the first 140 ms of the fixation and (b) the

benefit of having it present after the first 140 ms of the fixation should roughly equal the benefit

of having the information present for the whole fixation. However, the data in Experiment 2 of

Inhoff et al. show that the benefit of having the information present for the whole fixation (99

ms) was a lot bigger than the sum of the other two effects (66 ms). This suggests either that the

mask may have inhibitory effects on lexical processing due to there being incorrect letters

present, or that the display change itself disrupts performance in both the delay and abortion

conditions. Moreover, if the mask does have inhibitory effects, it seems reasonable to assume

that this inhibition may be greater when the incorrect information is present when processing

11
starts (i.e., when attention shifts) than when it appears after the correct information has had a

chance to be processed for a while.

Given these caveats, we completed a simulation of the basic paradigm that was used by

Inhoff et al. in their second experiment. This simulation was based on 1,000 statistical subjects,

using the current version of the model (E-Z Reader 9; Pollatsek et al., 2005; Reichle et al., 2006)

and all its standard parameter values. The basic goal of the simulation was to determine which

word was attended (word n, the pre-target word, vs. word n+1, the word that was subject to the

display change) at the point in time when the display change occurred in the experiment—140

ms after the onset of the fixation on word n. In the simulation, the lengths of words n and n+1

were set equal to six and five letters, respectively; the frequencies of words n and n+1 were

likewise set equal to 43 and 66 per million, respectively. (These values correspond fairly well to

the mean values used by Inhoff et al.)

The simulation indicated that on 36.9% of the trials, the model’s attention was focused on

word n at the time of the display change and on 46.2% of the trials, attention was focused on

word n+1. (These two values don’t sum to 100% because attention could sometimes be on word

n-1 or word n+2 either due to mislocated fixations or to attention already having switched to

word n+2 in the normal sequence of events.) If one assumes that attention must be focused on a

word if that word is to be processed, then the pattern of results that were predicted by the model

are consistent with Inhoff et al.’s observations of comparably sized preview costs in the delayed

and aborted preview conditions. That is, the model predicts that, in the delayed condition,

attention would shift to the location of word n+1 before that word was actually presented (i.e.,

while a mask was still being displayed) on approximately 46.2% of the trials. Likewise, the

12
model would predict that, in the abortion condition, attention would shift to word n+1 after the

word had already disappeared on approximately 36.9% of the trials. Both of these situations

would be expected to disrupt normal lexical processing, and hence lead to longer fixation

durations and (relative to the normal preview condition) sizeable preview costs. This simulation

thus predicts that disruptive events would occur more often in the delay condition than in the

abortion condition, consistent with the greater preview benefit in the abortion condition. Of

course, it doesn’t precisely predict the sizes of the preview benefits in the two conditions;

however, as we have argued above, making a precise prediction would require additional

assumptions about the interfering characteristics of having wrong letters and whether that

interference is the same before the word is presented and after the word is presented.

The results of this simulation are consistent with the qualitative argument that we made

(above) about the relative timing of the display changes, the uptake of visual information, and

the shifting of attention. The simulation results and our arguments make clear that having fairly

precise estimates of the timing of these events is critical for making accurate predictions.

Perhaps more importantly, our results indicate that the results of Inhoff et al.’s second experiment

are not a “fatal bullet” for the E-Z Reader model or—more generally—models of eye-movement

control that posit serial shifts of attention during reading. If anything, the results of the Inhoff et

al. experiments provide confirmation that our estimates of the model’s free parameters are

plausible.

We close with a cautionary note. Reading is a complex task, and no model at the present

stage of development is likely to be able to explain the data from any particular paradigm,

especially complex display change paradigms such as the one of Inhoff et al., without adding

13
additional assumptions to deal with complexities that such disruptions to the normal reading

process are likely to introduce. However, we should make clear that not all patterns of data that

could be obtained in such paradigms could be explained away with additional assumptions. In

particular, if Inhoff et al. had indeed found large differences between conditions in which the

parafoveal information was delayed 10 ms and those in which parafoveal information was

delayed 70 ms, this would have been a significant problem for the E-Z Reader Model in

particular, and probably for serial attention shift models in general5. That is, the essence of the

serial attention shift models is that there should almost always be a certain amount of “dead

time” on word n+1 before attention shifts. (A very small difference between these conditions

might be predicted, however, because fixations can occasionally be on the “wrong” word.) In

contrast, we are less clear about whether the pattern of results in Experiment 1 naturally fall out

of a parallel processing model in which word n+1 is processed from the beginning of a fixation.

We acknowledge that there might be a fine line between whether or not a model is falsifiable and

adjusting parameter values to account for differing results. To date, we are not convinced that

there are any empirical results that are problematic for the E-Z Reader model (though others may

feel differently about this). We see our efforts to explain new findings as they emerge within the

context of the model as steps forward and not arbitrary adjustments of various parameter values.

Our view is that the process of developing better, more accurate models is part of what it means

to make progress in understanding the complex processes of the mind involved in reading.

14
References

Hyönä, J., Betram, R., & Pollatsek, A. (2004). Are long compound words identified serially via

their constituents? Evidence for an eye-contingent display change study. Memory &

Cognition, 32, 523-532

Inhoff, A. W., Eiter, B. M., & Radach, R. (2005). The time course of linguistic information

extraction from consecutive words during eye fixations in reading. Journal of

Experimental Psychology: Human Perception and Performance, in press.

Pollatsek, A., Reichle, E. D., & Rayner, K. (2005). Tests of the E-Z Reader Model: Exploring

the interface between cognition and eye-movement control. Cognitive Psychology, in

press.

Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology,

7, 65-81.

Rayner, K., Ashby, J., Pollatsek, A., & Reichle, E.D. (2004). The effects of frequency and

predictability on eye fixations in reading: Implications for the E-Z Reader model.

Journal of Experimental Psychology: Human Perception and Performance, 30, 720-732.

Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye

movement control in reading. Psychological Review, 105, 125-157.

1Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading:

Accounting for initial fixation locations and refixations within the E-Z reader model.

Vision Research, 39, 4403-4411.

15
1Reichle, E. D., Rayner, K., and Pollatsek, A. (2003). The E-Z Reader model of eye movement

control in reading: Comparison to other models. Brain and Behavioral Sciences, 26, 445-

476.

Reichle, E. D., Pollatsek, A., & Rayner, K. (2006). E-Z Reader: A cognitive-control, serial-

attention model of eye-movement control during reading. Cognitive Systems Research, in

press.

16
Author Notes

This research was supported by Grant HD26765 from the National Institute of Health and Grant

R305H030235 from the Department of Education Institute of Education Sciences. We thank

Albrecht Inhoff for providing the materials from their experiments, and Simon Liversedge and an

anonymous reviewer for their comments on an earlier draft.

17
Footnotes

1. These estimates ignore the added complexity that results from situations involving multiple

fixations and word skipping. Like Inhoff et al., we ignore these situations because they are not

expected to happen often enough to affect our general argument. For similar reasons, we ignore

the effects of visual acuity limitations.

2. The experiment actually included six conditions; the four conditions that were mentioned are

the ones central to the arguments of Inhoff et al.

3. In the current version of the model (E-Z Reader 9; Pollatsek et al., 2005; Reichle et al., 2005),

the minimum and maximum durations of L1 are 67 ms and 122 ms, respectively. Similarly, the

minimum and maximum durations of L2 are 34 ms and 61 ms, respectively. These values define

the means of gamma distributions having standard deviations equal to .22 of their means. The

actual values of L1 and L2 that are used are random deviates that are sampled from these

distributions, and are thus subject to random variability. (These values of L1 are also modulated

by visual acuity.) For the sake of our argument, we will ignore these added sources of variability

and simply assume that typical values of L1 and L2 are 100 ms and 50 ms, respectively.

4. Our estimates also ignore that fact that the display changes actually occurred about 15 ms after

the nominal time (70 ms), which would reduce the predicted time after which effects should be

observed from 100 ms to 85 ms.

5. Note that a prediction that there would be virtually no difference between a 70 ms delay

condition and one in which the text was fully visible throughout would be more problematic

because there could be disruption due to the transient of the display change in the former

condition.

18

Вам также может понравиться