Академический Документы
Профессиональный Документы
Культура Документы
Alexander Pollatsek
Erik D. Reichle
University of Pittsburgh
and
Keith Rayner
Correspondence To:
Alexander Pollatsek
Department of Psychology
University of Massachusetts
Amherst, MA 01003
pollatsek@psych.umass.edu
Inhoff, Eiter, and Radach (2005) reported the results of two experiments which they
claimed were problematic for serial attention models of eye movements in reading (such as the
E-Z Reader model). In this reply, we demonstrate via argumentation and simulations that their
data pose no serious problem for the E-Z Reader model or serial attention models in general.
2
Inhoff, Eiter, and Radach (2005) presented data which they claim is inconsistent both
with the E-Z Reader model (Pollatsek, Reichle, & Rayner, 2005; Rayner, Ashby, Pollatsek, &
Reichle, 2004; Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Pollatsek, & Rayner, 1999,
2003, 2006) and, more generally, with any model that assumes that attention moves serially from
word to word (i.e., in which lexical processing of word n, the attended word, must complete
before lexical processing of word n+1, the next word in the text, begins). We disagree with their
claim; however, before getting into the details of their findings, we wish to make clear in what
sense the E-Z Reader model posits serial processing and thus illuminate what the controversy is
about.
First, although the E-Z Reader model posits that lexical processing of words is serial, it
assumes that lower-level (pre-attentive) visual processing is in parallel across the retina. This
parallel processing assumption is necessary because, among other things, if such processing did
not occur, a reader would not know how to target a saccade to the next word. Second, although
the model posits that attention moves serially from word to word, it does not imply that only one
word can be processed on a fixation in reading. In fact, the model predicts that the usual case in
reading is that lexical processing of the fixated word is completed and processing of the next
word in the text begins during the same fixation. On some occasions, on a single fixation,
processing of the next word (word n+1) can be completed which allows processing the word to
the right of that one (word n+2) to begin. This state of affairs is possible because the attended
word is not necessarily the fixated word, as we assume (consistent with the large literature on
covert attention) that covert attentional shifts can occur during a fixation. Third, as will become
important later, we should note that in the E-Z Reader model, the independence of covert
3
attention and fixation locations is extended one step further: the act of programming a saccade
from word n to word n+1 is decoupled from the act of shifting attention from word n to word
n+1. That is, it is assumed that (a) completion of a preliminary stage of lexical processing (L1) is
the signal that initiates an eye-movement program that is executed after approximately 125 ms
(in the latest versions of the model; Pollatsek et al., 2005) but (b) that completion of the second
stage (L2), which is the completion of lexical access, is the signal to shift attention. (In the
model, the shift of attention is assumed to be instantaneous.) To keep things simple, we will
assume in the discussion below that the fixated word is the word attended to when the fixation
begins; this is not always the case in the model (e.g., due to mistargeting of saccades), but is on a
All of these assumptions allow the model to produce a reading process in which more
than one word is usually processed on a fixation, and hence lexical processing of a word usually
starts before it is fixated. Let us now go through some of the details to give a better picture of
the timing of these processes. After L1 of the fixated word (word n) is completed, there will be
125 ms (i.e., the saccadic latency) before a saccade to the next word (word n+1) is executed.
Some part of the 125 ms will be occupied by the completion of lexical processing of word n.
However, if the second stage takes less than 125 ms (in most versions of the model, L2 requires
something like 50-75 ms, depending on factors such as the frequency of word n), then there will
an interval of time at least equal to the saccade programming time minus the duration of L2 for
the lexical processing of word n+1 before it is actually fixated1. (In the most recent version of
the model, we assume this lexical processing of the parafoveal information continues during the
saccade as well.)
4
The experiments reported by Inhoff et al. were undertaken to assess whether the
assumptions of a serial-attention-shift (SAS) model such as the E-Z Reader model can adequately
account for the time course of the extraction of information from word n+1 using a novel
display-change technique in which the letter information of word n+1 is replaced by a mask (i.e.,
essentially a random letter string that is the same length as word n+1) either for a portion of the
time at the end of the fixation on word n or a portion of the time at the beginning of the fixation
on word n. We take the two key findings to be the following. In Experiment 1, the mask
appeared for varying amounts of time and then was replaced by word n+1. The key pattern of
data that was thought to be incompatible with SAS models and the E-Z Reader model is: (a) the
gaze duration (i.e., the sum of all fixations of a word during the first pass through the text) on
word n+1 was virtually unaffected if the display onset of word n+1 was delayed by 70 ms or less
from when word n was initially fixated, and (b) increasing the delay of the text beyond 70 ms
increased the gaze duration on word n+1. The design and data in Experiment 2 were more
complex. There, a single display-change time was chosen (140 ms) and there were essentially
four viewing conditions: (1) full preview – word n+1 was always visible; (2) delayed
information – upon fixating word n, a mask was displayed for 140 ms in the location of word
n+1, which was subsequently replaced by word n+1; (3) aborted information – upon fixating
word n, word n+1 appeared for 140 ms, followed by the mask; (d) no preview – the mask was
continually displayed until word n+1 was fixated2. If one considers the no preview condition to
be a baseline for assessing the utility of extracting information about word n+1 from the
parafovea, they found a benefit of 99 ms for the full preview condition, a benefit of 40 ms in the
aborted information condition, and a benefit of only 24 ms in the delayed information condition.
5
Before going on to discuss the implication of these results for the E-Z Reader model (or
any model that assumes serial attention shifts), we need to stress one important aspect of these
experiments: The text was presented in alternating case (sO tHaT iT lOoKeD lIkE tHiS), which
presumably made reading significantly more difficult than normal. The gaze durations (even in
the full preview condition) were fairly long (over 300 ms), and the size of the preview benefit in
the full preview condition (99 ms) is quite a bit greater than is usually observed with normal text,
where the benefit from a full preview over a no preview condition is usually about 40 ms (see
Hyönä, Bertram, & Pollatsek, 2004, for a review of the findings on preview benefit). This
suggests, among other things, that model parameter estimates that are obtained from experiments
involving the reading of normal text will not be appropriate for explaining these data.
The results of Experiment 1 pose no problem in general for SAS models. Inhoff et al.’s
first finding, that having a mask (instead of word n+1) displayed during the first 70 ms of the
fixation on word n has virtually no effect on reading, is consistent with any SAS model in which
attention switches late enough such that the mask presented in word position n+1 during the first
70 ms has virtually always been replaced by the text by then. More specifically, for the E-Z
Reader model, this point is when the visual information for the delayed word (word n+1)
becomes available for lexical processing (which would be about 120-150 ms after the beginning
of the fixation on word n in the 70 ms delay condition, see below). Their second finding –
increasing the delay interferes with reading – is likewise consistent with any SAS model in
which attention starts to switch from word n to word n+1 at about 120-150 ms after the
beginning of the fixation on word n. Of course, this raises the question whether this pattern of
data is quantitatively consistent with the E-Z Reader model. We think that it is, but before we
6
attempt to model the data we will first try to sketch an argument for why.
In our simulations of normal reading, we assume that the visual processing stage (V)
occupies the first 50 ms of a fixation, and that L1 and L2 take something like 100 ms and 50 ms,
respectively, for a typical word3. (Conceptually, the 50 ms duration of the V stage corresponds to
the eye-to-mind delay; see Pollatsek et al., 2005.) This means that, if there was no benefit from
the preview of word n before it was fixated, the attention shift to word n+1 would occur
something like 200 ms after the beginning of the fixation (50 ms for V + 100 ms for L1 + 50 ms
for L2), and that the saccade to word n+1 would be executed something like 275 ms later (50 ms
for V + 100 ms for L1 + 125 ms for saccadic programming). However, if word n has a normal
preview (which is the case in Inhoff et al.’s experiments), the visual processing and some of the
lexical processing will be done on word n using parafoveal information. This would typically
subtract 40-50 ms from the times posited above, and thus mean that E-Z Reader would predict
that an attention shift to word n+1 typically occurs about 150-160 ms after the beginning of the
fixation on word n, and that the typical fixation duration on a word would be about 225-235 ms.
(Again, it is important to emphasize that that these estimates ignore several sources of variability
that would cause the range of times to be much larger; see Footnote 3.)
Are these values for the processes consistent with the pattern of data in Inhoff et al.? The
analysis in the above paragraph suggests that attention shifts to word n+1 should start to occur
150-160 ms after the beginning of a fixation. But now we have to consider how that would
relate to the display change conditions, as the E-Z Reader model posits that the effect of the
display change would not be immediate because the 50 ms eye-to-mind transmission time
applies to the arrival of this new information as well as to the arrival of the information at the
7
beginning of a fixation. Thus, for the 70 ms delay condition of Inhoff et al., for example, the
visual information that is needed for the lexical processing of word n+1 wouldn’t arrive until
120 ms after the fixation on word n+1 had began. More generally, for an X ms delay condition,
the information wouldn’t be available for lexical processing until X + 50 ms after the start of the
fixation. Thus, assuming the 150-160 ms estimate for the attention-shift latency in the above
paragraph, the E-Z Reader model would predict that one would start seeing some effects of the
delayed information for delays of 100 ms or longer, and that one would continue to see
increasingly interfering effects up to delays of 240 ms—consistent with the data. Although this
processes from trial to trial, it indicates that a simulation of the data from Experiment 1 should
not be difficult to achieve without making any changes for the unusual text conditions4.
The above discussion indicates that it is not problematic for SAS models, in general, to
explain the data from Experiment 1 of Inhoff et al. and indicates that the E-Z Reader model, in
particular, would not have trouble fitting their data quantitatively. Inhoff et al. appear to concede
the former point in their discussion of Experiment 1, but apparently deem that their data from
Experiment 2 is a stronger “disproof” of SAS models. Again, the key findings from Experiment
2 were: (a) there was only a 24 ms benefit from the word preview when it is delayed by 140 ms
(which replicated their data in Experiment 1), but (b) there was a 40 ms benefit when the word
preview information was “aborted” (i.e., replaced by a mask) after 140 ms. (Both of these values
are relative to a baseline condition when the mask was present for the entire preview fixation.)
Their reasoning appears to assume something like the following: (a) attention typically switches
over to word n+1 something like 150-160 ms after the beginning of the fixation, (b) the display
8
change occurs at something like the nominal 140 ms value, and (c) a typical fixation duration is
something like 225-235 ms. If so, then they seem to be arguing that in the delay condition, the
preview of word n+1 should be available after 150-160 ms and remain available until the end of
the fixation (or for about 65-85 ms). Note, however, that 150-160 ms would be the earliest
attention shifts and the mean value for how long the information would be available is less than
this. In the same spirit, we take them to be arguing that in the abortion condition, the information
is removed after 140 ms, so that the preview should not be available (or only available very
briefly for some of the trials due to the inherent variability in lexical processing times; see
Footnote 3). If so, one would expect greater preview benefit in the delay condition than in the
abortion condition. If this indeed is the argument, then it ignores the fact that the information
change caused by the display change does not have an impact on lexical processing until about
50 ms after the display change (because of the eye-to-mind lag), and that the visual information
on the fixation is usable for 50 ms after the beginning of the saccade (again, because of the eye-
to-mind lag). Thus, the rough calculation above would change quite drastically. In that case, the
predicted times that the information would be available in the delay condition would still be
about 65-85 ms, because the delay of the visual information due to the display change and the
delay caused by using visual information from the prior fixation would cancel each other.
However, the predicted times that the information would be available in the abortion condition
would increase to about 40-50 ms. (The display change would “register” 190 ms after fixating
word n, but attention would have switched to word n+1 after only 150-160 ms.) Moreover,
Inhoff et al. indicate that the display change typically occurred about 15 ms after the nominal
time, which would lengthen the estimated times that the preview is available in the abortion
9
condition to 55-70 ms.
Given the estimates from this oversimplified argument, it is not obvious that the E-Z
Reader model can predict better performance in the 140 ms abortion condition than in the 140 ms
delay condition and thus that the results of Experiment 2 are interesting. [I think we had it
backwards before.] However, we do not believe that they provide strong evidence against the
plausibility of SAS models such as E-Z Reader. In the remainder of this article, we will
demonstrate that the E-Z Reader model generates predictions that are in reasonable agreement
with the qualitative patterns of preview benefit that were observed by Inhoff et al. in the four
critical conditions of their second experiment. Before doing so, however, it is important to note
that both their paradigm and their data are complex, and hence likely to be difficult for any
model to account for fully. We (Pollatsek et al., 2005) have recently expanded our modeling
efforts to account for results from the boundary paradigm developed by Rayner (1975); however,
this is a simpler display change paradigm than the one used by Inhoff et al., as there was no
display change during a fixation. In these modeling efforts, we adopted the simplifying
assumption that a “no preview” condition (such as random letters replacing word n+1 in the
parafovea) merely amounts to withholding of the correct information until the word is fixated.
(That is, the presence of random letters is assumed to be a “neutral” mask.) This assumption
appears to work well for normal text (where the benefit of getting the word intact as a preview is
about 40 ms). We think our argument above, however, indicates that this assumption is likely to
be violated for mixed letter case conditions. As we already stated, the preview benefit in the full-
preview condition was 99 ms—much larger than the 40 ms preview benefits that are typically
observed (Hyönä et al., 2005). However, we acknowledge the need to look further to test
10
whether this neutrality assumption is also an oversimplification for normal text conditions as
well.
A second simplifying assumption in our modeling of these data is that the display change
itself has no effect beyond the change of information. That is, we have assumed that the visual
transients accompanying the display change do not have any interfering effects, such as drawing
that may need to be cancelled. Such problems are less severe when attempting to model
paradigms in which the display change occurs during a saccade, as such changes during saccades
are rarely if ever consciously perceived, and there is no evidence that the transients associated
with such changes have any effect on the attentional or eye movement systems. In contrast, it is
likely that display changes during fixations are often perceived, so that there is a danger of
If neither of these assumptions were violated, our modeling would predict that the sum of
(a) the benefit of having the preview present for the first 140 ms of the fixation and (b) the
benefit of having it present after the first 140 ms of the fixation should roughly equal the benefit
of having the information present for the whole fixation. However, the data in Experiment 2 of
Inhoff et al. show that the benefit of having the information present for the whole fixation (99
ms) was a lot bigger than the sum of the other two effects (66 ms). This suggests either that the
mask may have inhibitory effects on lexical processing due to there being incorrect letters
present, or that the display change itself disrupts performance in both the delay and abortion
conditions. Moreover, if the mask does have inhibitory effects, it seems reasonable to assume
that this inhibition may be greater when the incorrect information is present when processing
11
starts (i.e., when attention shifts) than when it appears after the correct information has had a
Given these caveats, we completed a simulation of the basic paradigm that was used by
Inhoff et al. in their second experiment. This simulation was based on 1,000 statistical subjects,
using the current version of the model (E-Z Reader 9; Pollatsek et al., 2005; Reichle et al., 2006)
and all its standard parameter values. The basic goal of the simulation was to determine which
word was attended (word n, the pre-target word, vs. word n+1, the word that was subject to the
display change) at the point in time when the display change occurred in the experiment—140
ms after the onset of the fixation on word n. In the simulation, the lengths of words n and n+1
were set equal to six and five letters, respectively; the frequencies of words n and n+1 were
likewise set equal to 43 and 66 per million, respectively. (These values correspond fairly well to
The simulation indicated that on 36.9% of the trials, the model’s attention was focused on
word n at the time of the display change and on 46.2% of the trials, attention was focused on
word n+1. (These two values don’t sum to 100% because attention could sometimes be on word
n-1 or word n+2 either due to mislocated fixations or to attention already having switched to
word n+2 in the normal sequence of events.) If one assumes that attention must be focused on a
word if that word is to be processed, then the pattern of results that were predicted by the model
are consistent with Inhoff et al.’s observations of comparably sized preview costs in the delayed
and aborted preview conditions. That is, the model predicts that, in the delayed condition,
attention would shift to the location of word n+1 before that word was actually presented (i.e.,
while a mask was still being displayed) on approximately 46.2% of the trials. Likewise, the
12
model would predict that, in the abortion condition, attention would shift to word n+1 after the
word had already disappeared on approximately 36.9% of the trials. Both of these situations
would be expected to disrupt normal lexical processing, and hence lead to longer fixation
durations and (relative to the normal preview condition) sizeable preview costs. This simulation
thus predicts that disruptive events would occur more often in the delay condition than in the
abortion condition, consistent with the greater preview benefit in the abortion condition. Of
course, it doesn’t precisely predict the sizes of the preview benefits in the two conditions;
however, as we have argued above, making a precise prediction would require additional
assumptions about the interfering characteristics of having wrong letters and whether that
interference is the same before the word is presented and after the word is presented.
The results of this simulation are consistent with the qualitative argument that we made
(above) about the relative timing of the display changes, the uptake of visual information, and
the shifting of attention. The simulation results and our arguments make clear that having fairly
precise estimates of the timing of these events is critical for making accurate predictions.
Perhaps more importantly, our results indicate that the results of Inhoff et al.’s second experiment
are not a “fatal bullet” for the E-Z Reader model or—more generally—models of eye-movement
control that posit serial shifts of attention during reading. If anything, the results of the Inhoff et
al. experiments provide confirmation that our estimates of the model’s free parameters are
plausible.
We close with a cautionary note. Reading is a complex task, and no model at the present
stage of development is likely to be able to explain the data from any particular paradigm,
especially complex display change paradigms such as the one of Inhoff et al., without adding
13
additional assumptions to deal with complexities that such disruptions to the normal reading
process are likely to introduce. However, we should make clear that not all patterns of data that
could be obtained in such paradigms could be explained away with additional assumptions. In
particular, if Inhoff et al. had indeed found large differences between conditions in which the
parafoveal information was delayed 10 ms and those in which parafoveal information was
delayed 70 ms, this would have been a significant problem for the E-Z Reader Model in
particular, and probably for serial attention shift models in general5. That is, the essence of the
serial attention shift models is that there should almost always be a certain amount of “dead
time” on word n+1 before attention shifts. (A very small difference between these conditions
might be predicted, however, because fixations can occasionally be on the “wrong” word.) In
contrast, we are less clear about whether the pattern of results in Experiment 1 naturally fall out
of a parallel processing model in which word n+1 is processed from the beginning of a fixation.
We acknowledge that there might be a fine line between whether or not a model is falsifiable and
adjusting parameter values to account for differing results. To date, we are not convinced that
there are any empirical results that are problematic for the E-Z Reader model (though others may
feel differently about this). We see our efforts to explain new findings as they emerge within the
context of the model as steps forward and not arbitrary adjustments of various parameter values.
Our view is that the process of developing better, more accurate models is part of what it means
to make progress in understanding the complex processes of the mind involved in reading.
14
References
Hyönä, J., Betram, R., & Pollatsek, A. (2004). Are long compound words identified serially via
their constituents? Evidence for an eye-contingent display change study. Memory &
Inhoff, A. W., Eiter, B. M., & Radach, R. (2005). The time course of linguistic information
Pollatsek, A., Reichle, E. D., & Rayner, K. (2005). Tests of the E-Z Reader Model: Exploring
press.
Rayner, K. (1975). The perceptual span and peripheral cues in reading. Cognitive Psychology,
7, 65-81.
Rayner, K., Ashby, J., Pollatsek, A., & Reichle, E.D. (2004). The effects of frequency and
predictability on eye fixations in reading: Implications for the E-Z Reader model.
Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye
1Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading:
Accounting for initial fixation locations and refixations within the E-Z reader model.
15
1Reichle, E. D., Rayner, K., and Pollatsek, A. (2003). The E-Z Reader model of eye movement
control in reading: Comparison to other models. Brain and Behavioral Sciences, 26, 445-
476.
Reichle, E. D., Pollatsek, A., & Rayner, K. (2006). E-Z Reader: A cognitive-control, serial-
press.
16
Author Notes
This research was supported by Grant HD26765 from the National Institute of Health and Grant
Albrecht Inhoff for providing the materials from their experiments, and Simon Liversedge and an
17
Footnotes
1. These estimates ignore the added complexity that results from situations involving multiple
fixations and word skipping. Like Inhoff et al., we ignore these situations because they are not
expected to happen often enough to affect our general argument. For similar reasons, we ignore
2. The experiment actually included six conditions; the four conditions that were mentioned are
3. In the current version of the model (E-Z Reader 9; Pollatsek et al., 2005; Reichle et al., 2005),
the minimum and maximum durations of L1 are 67 ms and 122 ms, respectively. Similarly, the
minimum and maximum durations of L2 are 34 ms and 61 ms, respectively. These values define
the means of gamma distributions having standard deviations equal to .22 of their means. The
actual values of L1 and L2 that are used are random deviates that are sampled from these
distributions, and are thus subject to random variability. (These values of L1 are also modulated
by visual acuity.) For the sake of our argument, we will ignore these added sources of variability
and simply assume that typical values of L1 and L2 are 100 ms and 50 ms, respectively.
4. Our estimates also ignore that fact that the display changes actually occurred about 15 ms after
the nominal time (70 ms), which would reduce the predicted time after which effects should be
5. Note that a prediction that there would be virtually no difference between a 70 ms delay
condition and one in which the text was fully visible throughout would be more problematic
because there could be disruption due to the transient of the display change in the former
condition.
18