Вы находитесь на странице: 1из 28

Coordination of gaze and hand movements for

tracking and tracing in 3D

Andrés Lamont

August 2009
Contents
1 Abstract 1

2 Introduction 2

3 Methods 3
3.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.2 Experimental set-up . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3.1 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.4.1 Mathematics of the PSOM . . . . . . . . . . . . . . . . . 10

3.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Results 16
4.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Gaze-nger lead time . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Discussion and conclusion 22


5.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.3 Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i
Chapter 1

Abstract
In this study nger and gaze movement in three-dimensional space was inves-

tigated. Since accurately measuring 3D gaze is dicult, most studies focus on

planar movement in the frontal plane. To gain more insight in eye movements

during hand movements in three dimensions and the coordination between eye

and hand movement, the binocular eye movements and nger movements of

subjects were measured in a series of trials. In particular we investigated how

the coordination between eye and nger movement diers in dierent directions

(frontal plane and depth) during tracking and tracing conditions. This was

accomplished by determining the lead time of gaze position relative to nger

position. Binocular gaze was measured with two scleral coils which were cal-

ibrated by xating on points on a virtual three-dimensional cube. Depth was

estimated by mapping the azimuth and elevation from the calibration trial from

both eyes while xating to various targets in 3D space using a parametrized self-

organizing map with a 3D interpolation function. Lead times were determined

by taking the cross-covariance between eye position and nger position. Results

show that nger position almost perfectly follows gaze position while tracking a

moving target in the frontal plane. However, the depth component of the gaze

lags behind the nger position about 80 ms because vergence is relatively slow,

while the speed of the nger is the same in all directions. While tracing a com-

pletely visible path the gaze position leads the nger position by approximately

170 ms, while the vergence or depth component only leads 100 ms. These results

show that it is not necessary to have the target foveated to accurately anticipate

and assist in planning of nger movements in three dimensions.

1
Chapter 2

Introduction
It has long been known that version and vergence eye movements have very

dierent dynamics (Collewijn et al. [1995], Erkelens [1988]). To gain more in-

sight in eye movements during hand movements in three dimensions and the

coordination between eye and hand movement, we measured the binocular eye

movements and nger movements of subjects in a series of trials. Several stud-

ies already investigated the relationship between nger movements and gaze.

However, because of the diculty of measuring gaze in three dimensions very

few studies have gone further than only measuring gaze in the frontal plane

([Gielen et al., 2009]). In the present study gaze was determined in three di-

mensions with a relatively new approach involving parametrized self-organizing

maps[Essig et al., 2006]. This approach consists of estimating the gaze position

in three dimensions by training a special kind of neural network, a parametrized

self-organizing map (PSOM), with the gaze position of known targets. This al-

lows the PSOM to map the pitch and jaw of both eyes to the corresponding

point in three dimensions. A PSOM is essentially a continous self-organizing

map which is capable of learning highly non-linear functions.

Two conditions of the coordination of hand and eye movement were inves-

tigated. The rst was the tracking condition in which subjects were asked to

track a target moving along an invisible path in three dimension with the tip

of their right index nger. In the tracing condition subjects were instructed to

trace a completely visible path in three dimensions. Tracking produces smooth

pursuit eye movements while tracing produces saccades along the path.

2
Chapter 3

Methods
Subjects were asked to track a target, moving in 3D, with the tip of the index

nger or to move the index nger along a completely visible path in 3D space.

The position of the nger tip and the gaze position were measured to determine

the time lag between gaze and nger position.

3.1 Subjects
Five subjects (aged between 24 and 56 years) of the Radboud University with

normal or corrected to normal visual acuity participated in this experiment. Two

of the subjects were female. All subjects had already taken part in previous eye

tracking experiments with scleral coils and reported that they had no problems

in perceiving depth in the presented anaglyph stimuli. Furthermore, all subjects

were right handed and none of the subjects had any known neurological or motor

disorder.

3.2 Experimental set-up


The experimental setup is shown in gure 3.1. The subjects were seated in an

oce chair with a high back rest. The position of the head was xed with a

helmet attached to the chair to prevent movement of the head. In front of the

subject a large back-projection screen of 2.5 m x 2.0 m was placed. The subject's

eyes were positioned approximately 70 cm from the screen. The height of the

chair was adjusted so that the subjects' cyclopean eye (i.e. midway between

the two eyes) were right in the middle of the projection area. A bycicle helmet

attachted to the chair was used to xate the subjects heads.

Data was measured in a right-handed Cartesian coordinate system with its

origin in the center of the screen, the x- and y-axis in the horizontal and vertical

3
direction respectively and the z-axis perpendicular to the screen towards the

subject (see gure 3.1).

The visual stimuli were back-projected on the projection screen with an

LCD projector (Philips ProSscreen 4750) with a refresh rate of 60 Hz. A red-

green anaglyph stereoscopic system was used to allow for perception in three

dimensions. The color of the projection was calibrated in order to exactly match

the red and green lters used in the anaglyph 3D glasses. This ensures that only

one eye can see its respective stimulus.

To project accurate 3D stimuli the perspective transformations had to take

into account the distance between the two eyes and the distance from the cy-

clopean eye to the screen. Therefore, the eye distance was measured before the

experiment. Typical distances were 6.5 cm for the inter-eye distance and 70 cm

for the the distance to the screen.

Optotrak cameras

Projection screen

Projector
Y

X
Z

Figure 3.1: Schematic overview of experimental setup with the projection system
(Philips ProScreen 4750), Optotrak 3020 system and the position of the chair
relative to the projector screen. The height of the chair was adjusted so that
the head of the subject was right in the middle in front of the projection area.
The X-, Y-, Z-coordinate had its origin in the center of the projection screen.

The nger position was measured with a Northern Digital Optotrak 3020

system at a sampling frequency of 120 Hz. This system uses three cameras

4
which were placed at the upper-right hand corner of the subject, tilted down at

an angle of 30º, to track the position of strobing infrared-light-emitting diodes

(markers). The Optotrak system can distinguish between multiple markers by

measuring the strobing frequency of each marker and is capable of tracking

the markers with a root-mean-square accuracy of 0.1 mm for the y- and z-

coordinates, and 0.15 mm for the x-coordinate. One marker was placed at the

tip of the right index nger, oriented towards the Optotrak cameras. Two more

markers were placed on the temples of the anaglyph stereo glasses to measure

any head movements. The stimuli were not adapted real time to head movement.

Head movement data was only used to determine whether the subjects had not

moved their head during the experiment.

Gaze was measured using scleral coils (Skalar) in both eyes simultaneously

in a large magnetic eld (Remmel Labs). The three orthogonal components of

this magnetic eld, at frequencies of 48 kHz, 60 kHz and 80 kHz, were produced

by a 3 × 3 × 3 m3 cubic frame. The subject was placed near the center of the

magnetic eld, where the eld is most homogeneous. The signals from both coils

were captured at a sampling frequency of 120 Hz and ltered by a 4th order low-
pass Bessel lter to prevent aliasing of the coil signal. Bessel lters have a very

slow transition from pass band to stop band, but have a linear phase response

in the pass band. This property makes Bessel lters preserve the wave shape

of the ltered signal. Using this system the yaw and pitch of the eyes could be

accurately measured with a resolution of about 0.25º (Frens et al. [1995]).

Two PCs were used to conduct the experiment. The rst PC was used to

control the Optotrak system. The second computer ran MATLAB which pre-

sented the stimuli and captured the coil signals with an analog data-acquisition

system which buered the input to ensure no samples were missed. The start of

the data-acquisition on both PCs was synchronized by sending a signal to the

parallel port.

3.3 Experiments
All subjects were asked to complete two calibration trials. First a calibration

rose was shown. This calibration trial was not used in the current study but

was carried out for compatibility with previous studies. The second trial was

the calibration cube trial. This trial was used for calibration of the coil voltages

with a parametrized self-organizing map (see section 3.4 for details).

After calibration the subjects were tested with dierent shapes with dierent

orientations and conditions over 20 trials (see table 3.1). The two conditions

tested were the 'tracking' and 'tracing' conditions. For the tracking condition

subjects were asked to track a dot moving along an invisible 3D path. In the

5
tracing condition the entire path was visible and the subjects were asked to

trace the path with the right index nger at approximately the same speed as

during the tracking condition.

Two dierent shapes were presented for the tracking conditions with one

additional shape for the tracing conditions; a Cassini shape and a Limaçon

shape. Each shape was presented in two orientations in space; the frontal ori-

entation, with the shape in the x-y plane, and the oblique orientation, with the

same shape rotated 45º left handed about the x-axis. Each shape was traced or

tracked four times beginning at the top. This was done in 10 seconds per cycle

(fast trials) and 15 seconds per cycle (slow trials) resulting in 8 trials per shape.

Furthermore, four additional tracing trials were included. These additional tri-

als consisted of a large helix (R=10 cm) and a small helix (R = 6 cm) with

their principal axis on the x-axis or on the z-axis. The subjects were asked to

trace the helix forth and back three times. In all, the subjects were rst tested

on the 8 Cassini trials, then on the 4 helices, and last on the 8 Limaçon trials

for a total of 20 trials consisting of approximately 30 minutes. More trials per

subject was not practical because wearing scleral coils is limited to 30 minutes

per day.

3.3.1 Shapes

The frontal Cassini shape (Figure3.2a) was dened as

   3

x(t) 2R · (1 + a · cos(2ωt)) · sin(ωt)
 y(t)  =  R · (1 + a · cos(2ωt)) · cos(ωt) 
   

z(t) z
where R = 12 cm, a = 0.5 and z = 30 cm.

The equation for the frontal Limaçon shape (Figure3.2b) was

   a

x(t) b · sin(ωt) + 2 · sin(2ωt)
a a
 y(t)  =  + b · cos(ωt) + · cos(2ωt) 
   
2 2
z(t) z
where a = 20 cm, b = 10 and z = 30 cm.

Both shapes were rotated 45º about the x-axis, with the bottom part closer

to the subject to obtain the oblique orientations. For the slow and fast conditions
2π 2π
ω= 15 and ω= 10 respectively.
The helix (Figure3.2c) was dened as

6
Trial Shape Orientation Speed Condition

1 Calibration Cube - - -
2 Cassini Frontal Slow Tracking
3 Cassini Frontal Slow Tracing
4 Cassini Frontal Fast Tracking
5 Cassini Frontal Fast Tracing
6 Cassini Oblique Slow Tracking
7 Cassini Oblique Slow Tracing
8 Cassini Oblique Fast Tracking
9 Cassini Oblique Fast Tracing
10 Small Helix Horizontal - Tracing
11 Small Helix Depth - Tracing
12 Large Helix Horizontal - Tracing
13 Large Helix Depth - Tracing
14 Limaçon Frontal Slow Tracking
15 Limaçon Frontal Slow Tracing
16 Limaçon Frontal Fast Tracking
17 Limaçon Frontal Fast Tracing
18 Limaçon Oblique Slow Tracking
19 Limaçon Oblique Slow Tracing
20 Limaçon Oblique Fast Tracking
21 Limaçon Oblique Fast Tracing

Table 3.1: This table indicates the shape (Cassini, Limaçon or helix), orientation
(frontal or oblique for tracking and tracing and depth and horizontal for tracing),
speed (fast or slow) and condition (tracking or tracing) for each trial.

7
   
x(t) R · cos(t)
 y(t)  =  R · sin(t) 
   
bt
z(t) 2π + c

where R=6 cm for the small helices and R = 10 cm for the large helices.

For the helices in the depth direction c = 12 cm. The horizontal helices were

constructed by swapping the x- and y-axes and setting c = −24 cm.

50

40

30

20

10
20
10 20
0
0
−10
−20 −20

(a) Cassini (b) Limaçon (c) Helix

Figure 3.2: Shapes used during the tracking and tracing conditions. (Not to
scale)

3.4 Calibration
Precise mapping of gaze position requires calibration. Usually during calibration

dierent points on the screen are presented and the subject is asked to xate on

these points. This allows the system to obtain a set of samples which relate the

coil signals to a certain gaze position on the screen. An interpolation algorithm,

usually a second or third degree polynomial, is tted to the data to approximate

a mapping function for the entire presentation area for both eyes. This method

will determine the gaze direction of both eyes separately. To determine the

xation point of the eyes in three dimensions the intersection point of the visual

axes of both eyes has to be calculated. In three dimensions two lines generally do

not intersect and therefore the xation point is estimated as the midpoint of the

shortest straight line connecting the visual axes. This approach works well for

2D gaze positions in the frontal plane but is noisy in the depth direction. This

is mainly attributed to small errors in the vergence which give rise to relatively

large errors in 3D gaze position, especially if the xation point is relatively far

from the subject.

To overcome this 3D eye tracking problem a neural network approach was

8
used. This method was motivated by results of previous work on 3D calibra-

tion. Specialized, individually calibrated neural networks can be used to reduce

the error in 3D gaze-position measurement [Essig et al., 2006]. The network

used was a parametrized self-organizing map (PSOM), which is a rapid learning

variant of Kohonen's self-organizing maps [Kohonen, 1998].

In order to calibrate the gaze in 3D with a PSOM a virtual calibration cube

was constructed (see gure 3.3) . It consisted of three planes on Z = 10, Z = 25

and Z = 40 cm a each containing 3 × 3 points with a distance of 15 cm resulting


in a total of 27 points spanning a volume of 30 × 30 × 30 cm.

Figure 3.3: Stereoscopic projection of the calibration cube. The darker lines
are only seen by the right eye and vice versa. In the experiment green and red
images are used with red-green anaglyph glasses.

Subjects were asked to xate exactly on the highlighted point and then press

a button. The button press triggers the acquisition of the coil voltages for one

second and the mean value and standard deviation are recorded. After that the

next point is highlighted until all points have been shown.

Showing the entire calibration cube at once enhances the subjects perception

of virtual depth and their ability to perform precise eye movements toward the

designated target. However, only one plane at a time was shown to avoid visual

interference between the planes. The data obtained was used to train the PSOM.

9
3.4.1 Mathematics of the PSOM

A self-organizing map (SOM) is an articial neural network that is able to pro-

duce a lower-dimensional discretized representation of the input space. SOMs

are dierent from other articial neural networks in that they use a neighbor-

hood function to preserve the topological properties of the input space. Also,

SOMs are capable of learning very non-linear functions [Walter and Ritter,

1996].

The ability to map high-dimensional spaces to lower-dimensional spaces

would make SOMs useful for accurate mapping from coil voltages to 3D co-

ordinates. However, SOMs only supply the position of the most stimulated

neuron in a 'neuron lattice' (Figure 3.4) instead of a continuous output. For

applications where relatively high-dimensional spaces with a high resolution are

needed the size of the neuron lattice quickly becomes unmanageable. In this

study this would mean that every voxel in the presentation space would require

a neuron. With a conservative resolution of 1 mm in all directions this would

result in a staggering 30 million neurons. Furthermore, SOM learning requires

thousands of training samples which makes it impossible to collect sucient

calibration data in a reasonable amount of time.

Parametrized self-organizing maps do not have these disadvantages. They

supply a continuous output and do not require large amounts of training sam-

ples. Instead the PSOM is trained with selected input-output pairs as param-

eters. However, this super fast learning comes at a cost. With only a few

examples to learn one can only determine a rather small number of adaptable

parameters. As a consequence the learning system must be very simple or it

must have a structure that is well matched to the tasked being learned. For-

tunately, the type of mapping the PSOM has to do for the calibration of the

coils is the same in every trial and subject and thus its structure only has to be

matched once to the task at hand.

The estimation of 3D gaze positions with a PSOM consist of two steps. First

the 3D gaze positions (x3d , y3d , z3d ) are mapped on the coil voltages (xl , yl ) and
(xr , yr ) through interpolation. Then the inverse of this mapping is computed

to create the desired mapping from coil voltages to 3D gaze position.

A PSOM with ve input neurons, 27 inner neurons, and 3 output neurons

was used. Each neuron was trained with one of the 27 calibration points k ∈ A,
where

A = {kxyz |kxyz = xêx + yêy + zêz ;x, y, z = 0, 1, 2}.

These points were arranged in a 3×3×3 grid with coordinates in each

dimension from 0 to 2. All further PSOM calculations will be shown in this

coordinate system. Naturally the results have to be scaled and translated to

10
2

1 Y

0
1 0
Z 2
2 1
0 X
Figure 3.4: 3D visualization of the neuron lattice. Each sphere represents a neu-
ron. Note that the lattice has the same topography as the presented calibration
points.

match the used laboratory coordinate system.

Each point k receives information about the gaze parameters. These param-

eters are stored in a training vector

w̃k = (xlk , ylk , xrk , yrk , xdk )

where xlk , ylk , xrk and yrk are the coil voltages for azimuth and elevation for

the left eye and the right eye, respectively. xdk is the divergence which is

dened as xdk := xrk − xlk . This fth dimension was added because the depth-

coordinate depends mainly on the divergence. The dierences in divergence are

smaller than those in the frontal plane, so the divergence is weighted with a

specic factor to approximate the range of x and y. This will lead to a faster

determination of the gaze position as function of the coil voltages.

The PSOM now knows the gaze parameters which correspond to the calibra-

tion points in 3D space. In the feed-forward step the PSOM interpolates this

mapping between the training vectors w̃k to estimate the gaze parameters for

any position in or near the calibration cube.

The interpolation function can be constructed from a superposition of basis

functions for each inner neuron, weighting the contribution of its training vectors

11
w̃k depending on the location s relative to the location of the neuron k.
X
f (s) = H(s, k)w̃k
k∈A

By specifying a training vector for each neuron location k ∈ A, a topological

order between the training points is introduced. Thus training vectors assigned

to neurons that are neighbors in A are perceived to have a specic neighborhood


relation. This allows the PSOM to draw extra curvature information from the

training set.

It can be quite dicult or impossible to nd a suitable set of basis functions

H(s, k) if the data is not topologically ordered, i.e. one must be able to

associate each data sample with a point in some discrete lattice in a topology-

preserving fashion (otherwise the interpolation will lead to unacceptably large

generalization errors) [Walter and Ritter, 1996]. The basis functions can be

constructed in many ways but must always meet two criteria. First:

H(s, k) = δs,k ∀s, k ∈ A

Furthermore the sum of all contribution weights should be one:

X
H(s, k) = 1∀s
k∈A

These criteria lead to

f (s) = w̃k ∀s ∈ A

This guarantees that the interpolation passes trough the calibration points.

For the construction of the basis functions a product ansatz from three 1D

functions can be used:

H(s, k) = H1D (sx , x) · H1D (sy , y) · H1D (sz , z)

x,y and z can only be 0,1 or 2. Thus the 1D functions must have the property:

H1D = (q, n) = δq,n ∀q ∈ R

where n ∈ {0, 1, 2}. Because n has only three possible values, only three basis

functions are needed. The simplest functions which have two roots are second

degree polynomials:

1 2 3
H0 = q − q+1
2 2

12
H1 = −q 2 + 2q

1 2 1
H2 = q − q
2 2

1.2

H
0
1
H1
H
3

0.8

0.6

0.4

0.2

−0.2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Figure 3.5: Three basis functions used for the interpolation function. Note that
the value of each basis function is one on the corresponding neuron and zero on
the other neurons.

So far the PSOM can map the 3D gaze coordinates on the coil voltages.

However, to estimate the 3D gaze position from the coil voltages the opposite is

needed. This is accomplished by calculating the inverse f −1 of f . This function

can be implemented by an iterative minimization of an error function:

1
E(s) = (f (s) − fcoils )2
2
The error function is the deviation of the measured coil voltages fcoils from
the coil voltages f (s) calculated by the PSOM for the current 3D gaze position

estimate s.
A very simple approach to calculate the minimum of the error function is

the gradient-descent:

13
δE(s)
s(t + 1) = s(t) −  ·Ω
δs
with  > 0, where Ω is the weight vector to normalize the dierent input

dimensions. For t = 0 the current nger or stimulus position can be used.

If no nger position is available the center of the calibration cube is a good

alternative.

While this method works very well in many cases the choice of the learn-

ing parameter  is critical. Too large values make the gradient-descent diverge

and too small values will take unnecessarily many iteration steps. In this study

a much more robust and ecient method, the Levenberg-Marquardt-Algorith m


[Levenberg, 1944, Marquardt, 1963], was used. This algorithm can nd the min-

imum of the error function approximately ten times faster than a conventional

gradient-descent.
The nal value of s indicates the subject's 3D gaze position as a function of

the current coil voltages.

3.5 Analysis
Prior to analyzing the delay between nger position and gaze position the rst

eighth and last eighth datapoints were discarded. The gaze position data was

ltered by a Savitzky-Golay [Savitzky and Golay, 1964] lter of 3rd order and

with a frame size of 21 data points (175 ms). This lter eectively performs

a local polynomial regression (3rd degree) on a distribution of equally spaced

points (21 data points). Gaze data points during blinks were cut out and inter-

polated with a Piecewise cubic Hermite spline. These are third-degree splines

with each polynomial of the spline in Hermite form. Hermite polynomials are

dened by:
2 dn −x2 /2
Hn (x) = (−1)n ex /2
e
dxn
Missing nger data was also interpolated with the same Piecewise cubic Hermite
spline.
The delay time between nger position and gaze position was determined

by calculating the cross-covariance of the ltered gaze position and unltered

nger position:

P   
 N −|m|−1 x(n + m) − 1
PN −1
xi yn∗ − 1
PN −1
yi∗ m≥0
n=0 N i=0 N i=0
cxy (m) =
c∗ (−m) m<0
yx

where x and y are the 1D gaze and nger position, respectively. The cross-

14
covariance was calculated and by rst multiplying the data with a Hann window,
  
2πn
w(n) = 0.5 1 − cos N −1 , to x possible problems at the boundaries of the

data. The result was normalized by setting the auto-covariance at zero lag to 1.

The time τ on the maximum value of the cross-covariance was taken as the time

lag between nger and gaze position. A negative value for τ implies that the gaze

leads the nger. Trials with the maximum of cxy (m), cmax < 0.9 for tracking

(12 of 43 trials) or cmax < 0.7 for tracing (18 of 70 trials) were not considered to
be suciently correlated and were discarded. The lag between nger and gaze

position in the depth direction was not determined for the frontal stimuli.

15
Chapter 4

Results
4.1 Calibration
Figure 4.1 shows the calibration results of a typical tracking trial for the new

calibration method with parametrized self-organizing maps (left) and the con-

ventional method by tting a third order polynomial to a calibration trial and

calculating the depth by estimating the intersection point of the two visual axes.

The blue, red and dotted black line represent gaze position, stimulus and nger

position, respectively.

Calibration by the conventional geometric regression method consists of rst

tting a number of predened points in a calibration trial with a third order

polynomial for both eyes separately. The gaze position is dened by the inter-

section point of both visual axes. However, lines in three dimensions generally

do not intersect. To overcome this problem gaze is estimated by calculating the

midpoint of the shortest line connecting both visual axes. Because of the very

small vergence angles at relatively great distances from the subject accuracy in

the depth direction is very small at those distances. This is clearly seen in the

z(x)-plot (gure4.1b, left-bottom panel). In this example the geometric regres-

sion method fails to accurately estimate depth further than approximately 40

cm from the subject's eyes.

The calibration method involving parametrized self-organizing maps used in

this study clearly has a great advantage in the depth direction over the geometric

regression calibration method. Figure 4.1a shows that gaze position is much

more accurate than the geometric calibration method (gure 4.1b), especially

in the depth direction. While accuracy of the gaze position at greater depth is

worse than at shorter distances by the small vergence angles, the eect is much

smaller than in the geometric regression method. Gaze almost perfectly follows

the stimulus in all directions.

16
15 15

20 10 20 10

10 5 10 5
y (cm)

y (cm)
0 0

y (cm)

y (cm)
0 0

−10 −10
−5 −5
−20 −20
0 0
20 −10 20 −10
20 20
40 0 40 0
−15 −15
z (cm) 60 −20 −20 −10 0 10 20 z (cm) 60 −20 −15 −10 −5 0 5 10 15
x (cm) x (cm)
x (cm) x (cm)

15 15 15 15

20 20 20 20

25 25 25 25

30 30 30 30
z (cm)

z (cm)

z (cm)

z (cm)
35 35 35 35

40 40 40 40

45 45 45 45

50 50 50 50
−20 −10 0 10 20 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
x (cm) y (cm) x (cm) y (cm)

(a) Calibrated with PSOM (b) Calibrated with geometric regression

Figure 4.1: Results of calibration using a conventional geometric regression


approach (left) and using a parametrized self-organizing map. The blue, red
and dotted black line represent gaze position, stimulus and nger position re-
spectively. While the geometric approach (left) does a reasonably good job of
estimating gaze position in the frontal plane, it fails to do the same in the depth
direction. Note that points relatively close to the screen (far from the subject
in the depth direction) are estimated as too close to the subject. This results in
a squashed gaze position estimate in the depth direction. The calibration done
with the PSOM (right) shows slightly better results in the frontal plane but is
much more accurate than the geometric approach in the depth direction. The
gaze position very closely follows the stimulus (and nger position) without loss
in accuracy at greater distances from the subject.

4.2 Tracking
Figure 4.2a shows a three dimensional plot of gaze and nger position for the

oblique Limaçon shape for the tracking condition for a single subject. The blue

line represents the gaze position and the black line the nger position. Gaze

produces a smooth line during smooth pursuit tracking and is superimposed

very well on the nger position, except for short deviations in depth during

saccades which are well known in literature [Chaturvedi and Van Gisbergen,

2000].

Figure 4.3a-c shows the nger and gaze position versus time for tracking for

the x, y and z direction, respectively, for the same trial and subject. It is clearly

seen that gaze position is generally a smooth line with very few saccades and

occasionally a small peak indicating a blink. Furthermore, the gaze position and

the nger position almost perfectly superimpose for the x and y directions. In

the depth direction (z direction, gure 4.3c) the noise is much larger than in the

frontal plane (gure 4.3a and 4.3b) and the signal is not as smooth. The peaks

correspond to small saccades in the frontal plane. Occasionally larger negative

17
45 45

40 40
35 35
30 30
25 25
20 20
15 15
20 20
10 20 10 20
0 10 0 10
0 0
−10 −10
−10 −10
−20 −20 −20 −20

(a) Tracking (b) Tracing

Figure 4.2: Finger position and Gaze position in three dimensions for the oblique
Limaçon shape while Tracking (a) and Tracing (b)

peaks (not visible here) are due to (unltered) blinks and can be as large as 40

cm in the z direction. Also, steps are visible which occur when the two eyes

make saccades in opposite directions.

4.3 Tracing
Figure 4.2a shows a three dimensional plot of gaze and nger position for the

oblique Limaçon shape for the tracing condition of a single subject. During

tracing the eyes make saccades along the completely visible shape and as a re-

sult gaze position does not superimpose as well on the nger position as during

the tracking condition. In the depth direction the variance is very large be-

cause of changes in vergence during saccadic eye movements [Chaturvedi and

Van Gisbergen, 2000].

Figures 4.3d-f show nger and gaze position versus time for tracing for the

x, y and z direction, respectively, for the same trial and subject. Unlike in the

tracking condition, the gaze position is not a smooth line and does not impose

very well on the nger position. Saccades are clearly seen as steps. Furthermore,

it is clear that gaze position signicantly leads nger position (see gure 4.3d-

e). As in gure 4.3c it can be seen in gure 4.3f that the variance in the depth

direction is very large. The jumps in the depth direction are much smaller than

in the frontal plane, and peaks can be seen which correspond with large saccades

in the frontal plane.

18
20 15 45
Gaze Gaze Gaze
15 Finger Finger Finger
10 40
10
5 35
5
position (cm)

position (cm)

position (cm)
0 0 30

−5
−5 25
−10
−10 20
−15

−20 −15 15
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
time (bins) time (bins) time (bins)

(a) Tracking x-axis (b) Tracking y-axis (c) Tracking z-axis

25 15 45
Gaze Gaze Gaze
20 Finger Finger Finger
10 40
15

10 5 35
position (cm)

position (cm)

position (cm)
5
0 30
0

−5 −5 25

−10
−10 20
−15

−20 −15 15
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
time (bins) time (bins) time (bins)

(d) Tracing x-axis (e) Tracing y-axis (f) Tracing z-axis

Figure 4.3: Finger and Gaze position versus time for tracking (panel a-c) and
1
tracing (panel d-f ). The time axis is plotted in bins, which each are
120 seconds
(8.3 ms).

4.4 Gaze-nger lead time


The lag between gaze position and nger position can be quantied by calcu-

lating the cross-covariance function between the gaze position and the nger

position. The cross-covariance function has a maximum close to the origin, but

is slightly shifted. This shift represents the lag time.

Figure 4.4 shows the gaze-nger lag time in ms for all trials. Trials with an

absolute lead time greater than 500 ms were ignored. Moreover, trials where the

nger position was not symmetrical by more than 200 ms (see gure 4.5) were

also removed. In the tracking condition the lag of the nger with respect to the

gaze position for the frontal plane (x-y plane) are not signicantly dierent from

0. However, in depth (z axis) the nger position leads the gaze position. This

indicates that the gaze does not have to be foveated for the nger to accurately

follow the stimulus. All ignored trials are shown in grey in gure 4.4.

Table 4.1 shows the mean values and standard deviations of the lag for the

x, y and z directions. The mean values were calculated by averaging over all

trials and all subjects.

For the tracing condition Table 4.1 shows that in the frontal plane the gaze

position leads the nger position by about 260 ms. During tracing the eye makes

saccades which xate on future nger positions until the nger has reached that

position to help lead the nger. After that, the eyes xate on a point further on

19
Tracking Tracing
500 500

X 0 0

−500 −500
5 10 15 20 25 30 35 40 10 20 30 40 50 60 70

500 500

Y 0 0

−500 −500
5 10 15 20 25 30 35 40 10 20 30 40 50 60 70

500 500

Z 0 0

−500 −500
5 10 15 20 25 30 35 40 10 20 30 40 50 60 70

Figure 4.4: The gaze-nger lead times for all trials are shown on the y-axis,
with each trial on the x-axis. The left column is for tracking, the right column
is for tracing. Trials shown in grey were ignored in the determination of the
gaze-nger lead times. Note the much large deviation in the Z direction. A
negative value means that gaze leads nger. Trials were the stimuli were only
presented in the frontal plane have a lead time of zero in the z-direction. This
explains the 'missing trials' in the bottom panels.

X Y Z
Tracking −18 ± 30 ms −49 ± 57 ms +83 ± 137 ms
N = 31 N = 31 N = 10
Tracing −274 ± 82 ms, −264 ± 92 ms, −104 ± 230ms
N = 52 N = 52 N = 35

Table 4.1: Mean lag between gaze and nger position. and number of trials
used to calculate the lag. Negative values indicate that the gaze leads. These
values were obtained by averaging over all trials and all subjects.

20
15
4.6 s
5.2 s
A
10 C

5
z (cm)

−5

−10

B
−15
10 15 20 25 30 35
t (s)

Figure 4.5: This gure shows the asymmetry in one of the trials. Note that the
distance between peak A and peak B is 4.6 s as opposed to the distance between
peak B an peak C which is 5.2 s, while both should be approximately be 5 s.
This asymmetry of 600 ms makes it impossible to calculate a reliable lead time
from the cross-covariance because the lead of peak B will shift the maximum of
the cross-covariance function to the left and thus produce a too large gaze-nger
lead time.

the path. In the tracking condition this is not possible because the future target

position is not known. For the depth direction the gaze also leads the nger.

However, vergence is much slower thus the lead time for the gaze is much less

than the lead time in the frontal plane.

21
Chapter 5

Discussion and conclusion


To gain more insight in eye movements during hand movements in three di-

mensions and the coordination between eye and hand movement, the binocular

eye movements and nger movements of subjects were measured in a series of

trials. In particular we investigated how the coordination between eye and n-

ger movement diers in dierent directions (in the frontal plane and in depth)

during tracking and tracing conditions. This was accomplished by determining

the lead time of gaze position relative to nger position in three dimensions.

When tracking a moving target in three dimensions nger position follows gaze

position almost directly with very little or no lag in the frontal plane. In depth

the gaze position lags behind the nger position by about 40 ms. For tracing a

completely visible path gaze always leads nger position by signicant time. In

the frontal plane gaze leads nger position by about 270 ms and in depth gaze

leads about 100 ms.

5.1 Calibration
Calibration of the scleral coils was accomplished by training a parametrized

self-organizing map. This relatively new approach was chosen because of its

more robust and accurate estimation of the gaze position relative to a conven-

tional method such as the geometric regression calibration method or to regular

feed-forward neural networks. Calibration with a PSOM showed a signicant

improvement in the calibration performance. Especially in the depth direction

where conventional methods have diculty to accurately estimate gaze position

at relatively great distances from the eyes because vergence angles become very

small at those depths. The calibration with the PSOM proved to be much more

robust and accurate in the depth direction with much less deterioration of ac-

curacy at relatively great depths. In this study a calibration trial of 27 points

22
arranged in a cube were used. If more points are used, for example a cube with

4×4×4 points for 64 points total, more information is given to the interpolation
function used during the PSOM calculations which should signicantly increase

the accuracy of the calibration.

5.2 Tracking
The results show that during the tracking condition the gaze-nger lead time in

the frontal plane is very small,−18±30 ms and −49±57 ms for the horizontal and
vertical direction respectively. The large standard deviation in the measurement

eectively means that there is no signicant lag between gaze and nger.

The time between generating a motor command in the motor cortex and the

actual eye and hand movement is about 10 ms and 120 ms respectively. For

the gaze position and the nger position to simultaneously move to the same

position the motor command can not be executed at the same time. The results

imply that motor commands to the hand precede the motor commands to the

eye by about 110 ms.

For tracking in the depth direction the gaze lags behind(!) the nger position

about 80 ms. While the standard deviation is very large due to arelatively small

number of successful tracking trials with a depth component it is clear that nger

position leads gaze position. It is well known that vergence is much slower than

version while nger movement is equally fast in all directions. Thus the results

for the depth direction support the results in the frontal plane which imply that

motor commands to the hand precede motor commands to the eye by about 110

ms.

5.3 Tracing
During tracing of a completely visible path the results show that gaze position

leads nger position by about 270 ms in the frontal plane. This large lead time

is due to saccades which jump ahead on the path to anticipate hand movement.

After a saccade the gaze position stays xed at the same position until the hand

near the gaze position after which the gaze again jumps ahead to anticipate

hand position.

In depth gaze position leads hand position only by about 100 ms. This

supports the fact that vergence is much slower than version. The dierence

in lead time between the frontal plane and depth is about 170 ms. This is

comparable to tracking, for which this value is about 120 ms.

23
5.4 Conclusion
In this study we have investigated nger and gaze movements in three dimen-

sional space. Binocular eye movements and nger movements of subjects were

measured in a series of trials. All subjects were asked to track a moving target

or trace a completely visible path in three dimensions. The results show that

motor commands from the motor cortex to the eye and nger position are not

issued simultaneously but that motor command to the hand precedes those to

the eye by about 110 ms. This implies that the brain anticipates the hand

movement. Thus it is not necessary to have the target foveated to plan accurate

hand movements towards the target which leaves time to compensate for the

120 ms delay between the motor commands from the motor cortex to the arm

and the actual arm movement.

24
Bibliography
V. Chaturvedi and J.A.M. Van Gisbergen. Stimulation in the rostral pole of

monkey superior colliculus: eects on vergence eye movements. Experimental


Brain Research, 132(1):7278, 2000.

H. Collewijn, C.J. Erkelens, and R.M. Steinman. Voluntary binocular gaze-shifts

in the plane of regard: dynamics of version and vergence. Vision Research,


35(23-24):33353358, 1995.

CJ Erkelens. Fusional limits for a large random-dot stereogram. Vision research,


28(2):345, 1988.

K. Essig, M. Pomplun, and H. Ritter. A neural network for 3D gaze recording

with binocular eye trackers.International Journal of Parallel, Emergent and


Distributed Systems, 21(2):7995, 2006.

MA Frens, AJ Van Opstal, and R.F. Van der Willigen. Spatial and temporal

factors determine auditory-visual interactions in human saccadic eye move-

ments. Perception & Psychophysics, 57(6):802816, 1995.

C.C.A.M. Gielen, T.M.H. Dijkstra, I.J. Roozen, and J. Welten. Coordination

of gaze and hand movements for tracking and tracing in 3D. Cortex, 45(3):

340355, 2009.

T. Kohonen. The self-organizing map. Neurocomputing, 21(1-3):16, 1998.

K. Levenberg. A method for the solution of certain non-linear problems in least

squares. Quart. Appl. Math, 2(2):164168, 1944.

D.W. Marquardt. An algorithm for least-squares estimation of nonlinear param-

eters. Journal of the Society for Industrial and Applied Mathematics, pages

431441, 1963.

A. Savitzky and M.J.E. Golay. Smoothing and Dierentiation by simplied

Least Squares Procedures Analytical Chemistry Vol 36, No. 8 July 1964 pp

25
1672-1639| 2J Kerawala, SM. A rapid method for calculating the least squares
solution of a polynomial of degree not exceedin the fth, 1964.

J. Walter and H. Ritter. Rapid learning with parametrized self-organizing maps.

Neurocomputing, 12(2-3):131153, 1996.

26

Вам также может понравиться