Verslag Bachelor Stage Andres Definitief

Coordination of gaze and hand movements for
tracking and tracing in 3D
Andrés Lamont
August 2009
Contents
1 Abstract 1
2 Introduction 2
3 Methods 3
3.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Experimental set-up . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3.1 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.1 Mathematics of the PSOM . . . . . . . . . . . . . . . . . 10
3.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Results 16
4.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Gaze-nger lead time . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Discussion and conclusion 22

5.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
i
Chapter 1
Abstract
In this study nger and gaze movement in three-dimensional space was inves-
tigated. Since accurately measuring 3D gaze is dicult, most studies focus on
planar movement in the frontal plane. To gain more insight in eye movements
during hand movements in three dimensions and the coordination between eye
and hand movement, the binocular eye movements and nger movements of
subjects were measured in a series of trials. In particular we investigated how
the coordination between eye and nger movement diers in dierent directions
(frontal plane and depth) during tracking and tracing conditions. This was
accomplished by determining the lead time of gaze position relative to nger
position. Binocular gaze was measured with two scleral coils which were cal-
ibrated by xating on points on a virtual three-dimensional cube. Depth was
estimated by mapping the azimuth and elevation from the calibration trial from
both eyes while xating to various targets in 3D space using a parametrized self-
organizing map with a 3D interpolation function. Lead times were determined
by taking the cross-covariance between eye position and nger position. Results
show that nger position almost perfectly follows gaze position while tracking a
moving target in the frontal plane. However, the depth component of the gaze
lags behind the nger position about 80 ms because vergence is relatively slow,
while the speed of the nger is the same in all directions. While tracing a com-
pletely visible path the gaze position leads the nger position by approximately
170 ms, while the vergence or depth component only leads 100 ms. These results
show that it is not necessary to have the target foveated to accurately anticipate
and assist in planning of nger movements in three dimensions.
1
Chapter 2
Introduction
It has long been known that version and vergence eye movements have very
dierent dynamics (Collewijn et al. [1995], Erkelens [1988]). To gain more in-
sight in eye movements during hand movements in three dimensions and the
coordination between eye and hand movement, we measured the binocular eye
movements and nger movements of subjects in a series of trials. Several stud-
ies already investigated the relationship between nger movements and gaze.
However, because of the diculty of measuring gaze in three dimensions very
few studies have gone further than only measuring gaze in the frontal plane
([Gielen et al., 2009]). In the present study gaze was determined in three di-
mensions with a relatively new approach involving parametrized self-organizing
maps[Essig et al., 2006]. This approach consists of estimating the gaze position
in three dimensions by training a special kind of neural network, a parametrized
self-organizing map (PSOM), with the gaze position of known targets. This al-
lows the PSOM to map the pitch and jaw of both eyes to the corresponding
point in three dimensions. A PSOM is essentially a continous self-organizing
map which is capable of learning highly non-linear functions.
Two conditions of the coordination of hand and eye movement were inves-
tigated. The rst was the tracking condition in which subjects were asked to
track a target moving along an invisible path in three dimension with the tip
of their right index nger. In the tracing condition subjects were instructed to
trace a completely visible path in three dimensions. Tracking produces smooth
pursuit eye movements while tracing produces saccades along the path.
2
Chapter 3
Methods
Subjects were asked to track a target, moving in 3D, with the tip of the index
nger or to move the index nger along a completely visible path in 3D space.
The position of the nger tip and the gaze position were measured to determine
the time lag between gaze and nger position.
3.1 Subjects
Five subjects (aged between 24 and 56 years) of the Radboud University with
normal or corrected to normal visual acuity participated in this experiment. Two
of the subjects were female. All subjects had already taken part in previous eye
tracking experiments with scleral coils and reported that they had no problems
in perceiving depth in the presented anaglyph stimuli. Furthermore, all subjects
were right handed and none of the subjects had any known neurological or motor
disorder.
3.2 Experimental set-up

The experimental setup is shown in gure 3.1. The subjects were seated in an
oce chair with a high back rest. The position of the head was xed with a
helmet attached to the chair to prevent movement of the head. In front of the
subject a large back-projection screen of 2.5 m x 2.0 m was placed. The subject's
eyes were positioned approximately 70 cm from the screen. The height of the
chair was adjusted so that the subjects' cyclopean eye (i.e. midway between
the two eyes) were right in the middle of the projection area. A bycicle helmet
attachted to the chair was used to xate the subjects heads.
Data was measured in a right-handed Cartesian coordinate system with its
origin in the center of the screen, the x- and y-axis in the horizontal and vertical
3
direction respectively and the z-axis perpendicular to the screen towards the
subject (see gure 3.1).
The visual stimuli were back-projected on the projection screen with an
LCD projector (Philips ProSscreen 4750) with a refresh rate of 60 Hz. A red-
green anaglyph stereoscopic system was used to allow for perception in three
dimensions. The color of the projection was calibrated in order to exactly match
the red and green lters used in the anaglyph 3D glasses. This ensures that only
one eye can see its respective stimulus.
To project accurate 3D stimuli the perspective transformations had to take
into account the distance between the two eyes and the distance from the cy-
clopean eye to the screen. Therefore, the eye distance was measured before the
experiment. Typical distances were 6.5 cm for the inter-eye distance and 70 cm
for the the distance to the screen.
Optotrak cameras
Projection screen
Projector
Y
X
Z
Figure 3.1: Schematic overview of experimental setup with the projection system
(Philips ProScreen 4750), Optotrak 3020 system and the position of the chair
relative to the projector screen. The height of the chair was adjusted so that
the head of the subject was right in the middle in front of the projection area.
The X-, Y-, Z-coordinate had its origin in the center of the projection screen.
The nger position was measured with a Northern Digital Optotrak 3020
system at a sampling frequency of 120 Hz. This system uses three cameras
4
which were placed at the upper-right hand corner of the subject, tilted down at
an angle of 30º, to track the position of strobing infrared-light-emitting diodes
(markers). The Optotrak system can distinguish between multiple markers by
measuring the strobing frequency of each marker and is capable of tracking
the markers with a root-mean-square accuracy of 0.1 mm for the y- and z-
coordinates, and 0.15 mm for the x-coordinate. One marker was placed at the
tip of the right index nger, oriented towards the Optotrak cameras. Two more
markers were placed on the temples of the anaglyph stereo glasses to measure
any head movements. The stimuli were not adapted real time to head movement.
Head movement data was only used to determine whether the subjects had not
moved their head during the experiment.
Gaze was measured using scleral coils (Skalar) in both eyes simultaneously
in a large magnetic eld (Remmel Labs). The three orthogonal components of
this magnetic eld, at frequencies of 48 kHz, 60 kHz and 80 kHz, were produced
by a 3 × 3 × 3 m3 cubic frame. The subject was placed near the center of the
magnetic eld, where the eld is most homogeneous. The signals from both coils
were captured at a sampling frequency of 120 Hz and ltered by a 4th order low-
pass Bessel lter to prevent aliasing of the coil signal. Bessel lters have a very
slow transition from pass band to stop band, but have a linear phase response
in the pass band. This property makes Bessel lters preserve the wave shape
of the ltered signal. Using this system the yaw and pitch of the eyes could be
accurately measured with a resolution of about 0.25º (Frens et al. [1995]).
Two PCs were used to conduct the experiment. The rst PC was used to
control the Optotrak system. The second computer ran MATLAB which pre-
sented the stimuli and captured the coil signals with an analog data-acquisition
system which buered the input to ensure no samples were missed. The start of
the data-acquisition on both PCs was synchronized by sending a signal to the
parallel port.
3.3 Experiments
All subjects were asked to complete two calibration trials. First a calibration
rose was shown. This calibration trial was not used in the current study but
was carried out for compatibility with previous studies. The second trial was
the calibration cube trial. This trial was used for calibration of the coil voltages
with a parametrized self-organizing map (see section 3.4 for details).
After calibration the subjects were tested with dierent shapes with dierent
orientations and conditions over 20 trials (see table 3.1). The two conditions
tested were the 'tracking' and 'tracing' conditions. For the tracking condition
subjects were asked to track a dot moving along an invisible 3D path. In the
5
tracing condition the entire path was visible and the subjects were asked to
trace the path with the right index nger at approximately the same speed as
during the tracking condition.
Two dierent shapes were presented for the tracking conditions with one
additional shape for the tracing conditions; a Cassini shape and a Limaçon
shape. Each shape was presented in two orientations in space; the frontal ori-
entation, with the shape in the x-y plane, and the oblique orientation, with the
same shape rotated 45º left handed about the x-axis. Each shape was traced or
tracked four times beginning at the top. This was done in 10 seconds per cycle
(fast trials) and 15 seconds per cycle (slow trials) resulting in 8 trials per shape.
Furthermore, four additional tracing trials were included. These additional tri-
als consisted of a large helix (R=10 cm) and a small helix (R = 6 cm) with
their principal axis on the x-axis or on the z-axis. The subjects were asked to
trace the helix forth and back three times. In all, the subjects were rst tested
on the 8 Cassini trials, then on the 4 helices, and last on the 8 Limaçon trials
for a total of 20 trials consisting of approximately 30 minutes. More trials per
subject was not practical because wearing scleral coils is limited to 30 minutes
per day.
3.3.1 Shapes
The frontal Cassini shape (Figure3.2a) was dened as
   3

x(t) 2R · (1 + a · cos(2ωt)) · sin(ωt)
 y(t)  =  R · (1 + a · cos(2ωt)) · cos(ωt) 
   
z(t) z
where R = 12 cm, a = 0.5 and z = 30 cm.
The equation for the frontal Limaçon shape (Figure3.2b) was
   a

x(t) b · sin(ωt) + 2 · sin(2ωt)
a a
 y(t)  =  + b · cos(ωt) + · cos(2ωt) 
   
2 2
z(t) z
where a = 20 cm, b = 10 and z = 30 cm.
Both shapes were rotated 45º about the x-axis, with the bottom part closer
to the subject to obtain the oblique orientations. For the slow and fast conditions
2π 2π
ω= 15 and ω= 10 respectively.
The helix (Figure3.2c) was dened as
6
Trial Shape Orientation Speed Condition
1 Calibration Cube - - -
2 Cassini Frontal Slow Tracking
3 Cassini Frontal Slow Tracing
4 Cassini Frontal Fast Tracking
5 Cassini Frontal Fast Tracing
6 Cassini Oblique Slow Tracking
7 Cassini Oblique Slow Tracing
8 Cassini Oblique Fast Tracking
9 Cassini Oblique Fast Tracing
10 Small Helix Horizontal - Tracing
11 Small Helix Depth - Tracing
12 Large Helix Horizontal - Tracing
13 Large Helix Depth - Tracing
14 Limaçon Frontal Slow Tracking
15 Limaçon Frontal Slow Tracing
16 Limaçon Frontal Fast Tracking
17 Limaçon Frontal Fast Tracing
18 Limaçon Oblique Slow Tracking
19 Limaçon Oblique Slow Tracing
20 Limaçon Oblique Fast Tracking
21 Limaçon Oblique Fast Tracing
Table 3.1: This table indicates the shape (Cassini, Limaçon or helix), orientation
(frontal or oblique for tracking and tracing and depth and horizontal for tracing),
speed (fast or slow) and condition (tracking or tracing) for each trial.
7
   
x(t) R · cos(t)
 y(t)  =  R · sin(t) 
   
bt
z(t) 2π + c
where R=6 cm for the small helices and R = 10 cm for the large helices.
For the helices in the depth direction c = 12 cm. The horizontal helices were
constructed by swapping the x- and y-axes and setting c = −24 cm.
50
40
30
20
10
20
10 20
0
0
−10
−20 −20
(a) Cassini (b) Limaçon (c) Helix
Figure 3.2: Shapes used during the tracking and tracing conditions. (Not to
scale)
3.4 Calibration
Precise mapping of gaze position requires calibration. Usually during calibration
dierent points on the screen are presented and the subject is asked to xate on
these points. This allows the system to obtain a set of samples which relate the
coil signals to a certain gaze position on the screen. An interpolation algorithm,
usually a second or third degree polynomial, is tted to the data to approximate
a mapping function for the entire presentation area for both eyes. This method
will determine the gaze direction of both eyes separately. To determine the
xation point of the eyes in three dimensions the intersection point of the visual
axes of both eyes has to be calculated. In three dimensions two lines generally do
not intersect and therefore the xation point is estimated as the midpoint of the
shortest straight line connecting the visual axes. This approach works well for
2D gaze positions in the frontal plane but is noisy in the depth direction. This
is mainly attributed to small errors in the vergence which give rise to relatively
large errors in 3D gaze position, especially if the xation point is relatively far
from the subject.
To overcome this 3D eye tracking problem a neural network approach was
8
used. This method was motivated by results of previous work on 3D calibra-
tion. Specialized, individually calibrated neural networks can be used to reduce
the error in 3D gaze-position measurement [Essig et al., 2006]. The network
used was a parametrized self-organizing map (PSOM), which is a rapid learning
variant of Kohonen's self-organizing maps [Kohonen, 1998].
In order to calibrate the gaze in 3D with a PSOM a virtual calibration cube
was constructed (see gure 3.3) . It consisted of three planes on Z = 10, Z = 25
and Z = 40 cm a each containing 3 × 3 points with a distance of 15 cm resulting

in a total of 27 points spanning a volume of 30 × 30 × 30 cm.
Figure 3.3: Stereoscopic projection of the calibration cube. The darker lines
are only seen by the right eye and vice versa. In the experiment green and red
images are used with red-green anaglyph glasses.
Subjects were asked to xate exactly on the highlighted point and then press
a button. The button press triggers the acquisition of the coil voltages for one
second and the mean value and standard deviation are recorded. After that the
next point is highlighted until all points have been shown.
Showing the entire calibration cube at once enhances the subjects perception
of virtual depth and their ability to perform precise eye movements toward the
designated target. However, only one plane at a time was shown to avoid visual
interference between the planes. The data obtained was used to train the PSOM.
9
3.4.1 Mathematics of the PSOM
A self-organizing map (SOM) is an articial neural network that is able to pro-
duce a lower-dimensional discretized representation of the input space. SOMs
are dierent from other articial neural networks in that they use a neighbor-
hood function to preserve the topological properties of the input space. Also,
SOMs are capable of learning very non-linear functions [Walter and Ritter,
1996].
The ability to map high-dimensional spaces to lower-dimensional spaces
would make SOMs useful for accurate mapping from coil voltages to 3D co-
ordinates. However, SOMs only supply the position of the most stimulated
neuron in a 'neuron lattice' (Figure 3.4) instead of a continuous output. For
applications where relatively high-dimensional spaces with a high resolution are
needed the size of the neuron lattice quickly becomes unmanageable. In this
study this would mean that every voxel in the presentation space would require
a neuron. With a conservative resolution of 1 mm in all directions this would
result in a staggering 30 million neurons. Furthermore, SOM learning requires
thousands of training samples which makes it impossible to collect sucient
calibration data in a reasonable amount of time.
Parametrized self-organizing maps do not have these disadvantages. They
supply a continuous output and do not require large amounts of training sam-
ples. Instead the PSOM is trained with selected input-output pairs as param-
eters. However, this super fast learning comes at a cost. With only a few
examples to learn one can only determine a rather small number of adaptable
parameters. As a consequence the learning system must be very simple or it
must have a structure that is well matched to the tasked being learned. For-
tunately, the type of mapping the PSOM has to do for the calibration of the
coils is the same in every trial and subject and thus its structure only has to be
matched once to the task at hand.
The estimation of 3D gaze positions with a PSOM consist of two steps. First
the 3D gaze positions (x3d , y3d , z3d ) are mapped on the coil voltages (xl , yl ) and
(xr , yr ) through interpolation. Then the inverse of this mapping is computed
to create the desired mapping from coil voltages to 3D gaze position.
A PSOM with ve input neurons, 27 inner neurons, and 3 output neurons
was used. Each neuron was trained with one of the 27 calibration points k ∈ A,
where
A = {kxyz |kxyz = xêx + yêy + zêz ;x, y, z = 0, 1, 2}.
These points were arranged in a 3×3×3 grid with coordinates in each
dimension from 0 to 2. All further PSOM calculations will be shown in this
coordinate system. Naturally the results have to be scaled and translated to
10
2
1 Y
0
1 0
Z 2
2 1
0 X
Figure 3.4: 3D visualization of the neuron lattice. Each sphere represents a neu-
ron. Note that the lattice has the same topography as the presented calibration
points.
match the used laboratory coordinate system.
Each point k receives information about the gaze parameters. These param-
eters are stored in a training vector
w̃k = (xlk , ylk , xrk , yrk , xdk )
where xlk , ylk , xrk and yrk are the coil voltages for azimuth and elevation for
the left eye and the right eye, respectively. xdk is the divergence which is
dened as xdk := xrk − xlk . This fth dimension was added because the depth-
coordinate depends mainly on the divergence. The dierences in divergence are
smaller than those in the frontal plane, so the divergence is weighted with a
specic factor to approximate the range of x and y. This will lead to a faster
determination of the gaze position as function of the coil voltages.
The PSOM now knows the gaze parameters which correspond to the calibra-
tion points in 3D space. In the feed-forward step the PSOM interpolates this
mapping between the training vectors w̃k to estimate the gaze parameters for
any position in or near the calibration cube.
The interpolation function can be constructed from a superposition of basis
functions for each inner neuron, weighting the contribution of its training vectors
11
w̃k depending on the location s relative to the location of the neuron k.
X
f (s) = H(s, k)w̃k
k∈A
By specifying a training vector for each neuron location k ∈ A, a topological
order between the training points is introduced. Thus training vectors assigned
to neurons that are neighbors in A are perceived to have a specic neighborhood

relation. This allows the PSOM to draw extra curvature information from the
training set.
It can be quite dicult or impossible to nd a suitable set of basis functions
H(s, k) if the data is not topologically ordered, i.e. one must be able to
associate each data sample with a point in some discrete lattice in a topology-
preserving fashion (otherwise the interpolation will lead to unacceptably large
generalization errors) [Walter and Ritter, 1996]. The basis functions can be
constructed in many ways but must always meet two criteria. First:
H(s, k) = δs,k ∀s, k ∈ A
Furthermore the sum of all contribution weights should be one:
X
H(s, k) = 1∀s
k∈A
These criteria lead to
f (s) = w̃k ∀s ∈ A
This guarantees that the interpolation passes trough the calibration points.
For the construction of the basis functions a product ansatz from three 1D
functions can be used:
H(s, k) = H1D (sx , x) · H1D (sy , y) · H1D (sz , z)
x,y and z can only be 0,1 or 2. Thus the 1D functions must have the property:
H1D = (q, n) = δq,n ∀q ∈ R
where n ∈ {0, 1, 2}. Because n has only three possible values, only three basis
functions are needed. The simplest functions which have two roots are second
degree polynomials:
1 2 3
H0 = q − q+1
2 2
12
H1 = −q 2 + 2q
1 2 1
H2 = q − q
2 2
1.2
H
0
1
H1
H
3
0.8
0.6
0.4
0.2
−0.2
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Figure 3.5: Three basis functions used for the interpolation function. Note that
the value of each basis function is one on the corresponding neuron and zero on
the other neurons.
So far the PSOM can map the 3D gaze coordinates on the coil voltages.
However, to estimate the 3D gaze position from the coil voltages the opposite is
needed. This is accomplished by calculating the inverse f −1 of f . This function
can be implemented by an iterative minimization of an error function:
1
E(s) = (f (s) − fcoils )2
2
The error function is the deviation of the measured coil voltages fcoils from
the coil voltages f (s) calculated by the PSOM for the current 3D gaze position
estimate s.
A very simple approach to calculate the minimum of the error function is
the gradient-descent:
13
δE(s)
s(t + 1) = s(t) − ·Ω
δs
with > 0, where Ω is the weight vector to normalize the dierent input
dimensions. For t = 0 the current nger or stimulus position can be used.
If no nger position is available the center of the calibration cube is a good
alternative.
While this method works very well in many cases the choice of the learn-
ing parameter is critical. Too large values make the gradient-descent diverge
and too small values will take unnecessarily many iteration steps. In this study
a much more robust and ecient method, the Levenberg-Marquardt-Algorith m

[Levenberg, 1944, Marquardt, 1963], was used. This algorithm can nd the min-
imum of the error function approximately ten times faster than a conventional
gradient-descent.
The nal value of s indicates the subject's 3D gaze position as a function of
the current coil voltages.
3.5 Analysis
Prior to analyzing the delay between nger position and gaze position the rst
eighth and last eighth datapoints were discarded. The gaze position data was
ltered by a Savitzky-Golay [Savitzky and Golay, 1964] lter of 3rd order and
with a frame size of 21 data points (175 ms). This lter eectively performs
a local polynomial regression (3rd degree) on a distribution of equally spaced
points (21 data points). Gaze data points during blinks were cut out and inter-
polated with a Piecewise cubic Hermite spline. These are third-degree splines
with each polynomial of the spline in Hermite form. Hermite polynomials are
dened by:
2 dn −x2 /2
Hn (x) = (−1)n ex /2
e
dxn
Missing nger data was also interpolated with the same Piecewise cubic Hermite
spline.
The delay time between nger position and gaze position was determined
by calculating the cross-covariance of the ltered gaze position and unltered
nger position:
P
 N −|m|−1 x(n + m) − 1
PN −1
xi yn∗ − 1
PN −1
yi∗ m≥0
n=0 N i=0 N i=0
cxy (m) =
c∗ (−m) m<0
yx
where x and y are the 1D gaze and nger position, respectively. The cross-
14
covariance was calculated and by rst multiplying the data with a Hann window,

2πn
w(n) = 0.5 1 − cos N −1 , to x possible problems at the boundaries of the
data. The result was normalized by setting the auto-covariance at zero lag to 1.
The time τ on the maximum value of the cross-covariance was taken as the time
lag between nger and gaze position. A negative value for τ implies that the gaze
leads the nger. Trials with the maximum of cxy (m), cmax < 0.9 for tracking
(12 of 43 trials) or cmax < 0.7 for tracing (18 of 70 trials) were not considered to
be suciently correlated and were discarded. The lag between nger and gaze
position in the depth direction was not determined for the frontal stimuli.
15
Chapter 4
Results
4.1 Calibration
Figure 4.1 shows the calibration results of a typical tracking trial for the new
calibration method with parametrized self-organizing maps (left) and the con-
ventional method by tting a third order polynomial to a calibration trial and
calculating the depth by estimating the intersection point of the two visual axes.
The blue, red and dotted black line represent gaze position, stimulus and nger
position, respectively.
Calibration by the conventional geometric regression method consists of rst
tting a number of predened points in a calibration trial with a third order
polynomial for both eyes separately. The gaze position is dened by the inter-
section point of both visual axes. However, lines in three dimensions generally
do not intersect. To overcome this problem gaze is estimated by calculating the
midpoint of the shortest line connecting both visual axes. Because of the very
small vergence angles at relatively great distances from the subject accuracy in
the depth direction is very small at those distances. This is clearly seen in the
z(x)-plot (gure4.1b, left-bottom panel). In this example the geometric regres-
sion method fails to accurately estimate depth further than approximately 40
cm from the subject's eyes.
The calibration method involving parametrized self-organizing maps used in
this study clearly has a great advantage in the depth direction over the geometric
regression calibration method. Figure 4.1a shows that gaze position is much
more accurate than the geometric calibration method (gure 4.1b), especially
in the depth direction. While accuracy of the gaze position at greater depth is
worse than at shorter distances by the small vergence angles, the eect is much
smaller than in the geometric regression method. Gaze almost perfectly follows
the stimulus in all directions.
16
15 15
20 10 20 10
10 5 10 5
y (cm)
y (cm)
0 0
y (cm)
y (cm)
0 0
−10 −10
−5 −5
−20 −20
0 0
20 −10 20 −10
20 20
40 0 40 0
−15 −15
z (cm) 60 −20 −20 −10 0 10 20 z (cm) 60 −20 −15 −10 −5 0 5 10 15
x (cm) x (cm)
x (cm) x (cm)
15 15 15 15
20 20 20 20
25 25 25 25
30 30 30 30
z (cm)
z (cm)
z (cm)
z (cm)
35 35 35 35
40 40 40 40
45 45 45 45
50 50 50 50
−20 −10 0 10 20 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
x (cm) y (cm) x (cm) y (cm)
(a) Calibrated with PSOM (b) Calibrated with geometric regression
Figure 4.1: Results of calibration using a conventional geometric regression

approach (left) and using a parametrized self-organizing map. The blue, red
and dotted black line represent gaze position, stimulus and nger position re-
spectively. While the geometric approach (left) does a reasonably good job of
estimating gaze position in the frontal plane, it fails to do the same in the depth
direction. Note that points relatively close to the screen (far from the subject
in the depth direction) are estimated as too close to the subject. This results in
a squashed gaze position estimate in the depth direction. The calibration done
with the PSOM (right) shows slightly better results in the frontal plane but is
much more accurate than the geometric approach in the depth direction. The
gaze position very closely follows the stimulus (and nger position) without loss
in accuracy at greater distances from the subject.
4.2 Tracking
Figure 4.2a shows a three dimensional plot of gaze and nger position for the
oblique Limaçon shape for the tracking condition for a single subject. The blue
line represents the gaze position and the black line the nger position. Gaze
produces a smooth line during smooth pursuit tracking and is superimposed
very well on the nger position, except for short deviations in depth during
saccades which are well known in literature [Chaturvedi and Van Gisbergen,
2000].
Figure 4.3a-c shows the nger and gaze position versus time for tracking for
the x, y and z direction, respectively, for the same trial and subject. It is clearly
seen that gaze position is generally a smooth line with very few saccades and
occasionally a small peak indicating a blink. Furthermore, the gaze position and
the nger position almost perfectly superimpose for the x and y directions. In
the depth direction (z direction, gure 4.3c) the noise is much larger than in the
frontal plane (gure 4.3a and 4.3b) and the signal is not as smooth. The peaks
correspond to small saccades in the frontal plane. Occasionally larger negative
17
45 45
40 40
35 35
30 30
25 25
20 20
15 15
20 20
10 20 10 20
0 10 0 10
0 0
−10 −10
−10 −10
−20 −20 −20 −20
(a) Tracking (b) Tracing
Figure 4.2: Finger position and Gaze position in three dimensions for the oblique
Limaçon shape while Tracking (a) and Tracing (b)
peaks (not visible here) are due to (unltered) blinks and can be as large as 40
cm in the z direction. Also, steps are visible which occur when the two eyes
make saccades in opposite directions.
4.3 Tracing
Figure 4.2a shows a three dimensional plot of gaze and nger position for the
oblique Limaçon shape for the tracing condition of a single subject. During
tracing the eyes make saccades along the completely visible shape and as a re-
sult gaze position does not superimpose as well on the nger position as during
the tracking condition. In the depth direction the variance is very large be-
cause of changes in vergence during saccadic eye movements [Chaturvedi and
Van Gisbergen, 2000].
Figures 4.3d-f show nger and gaze position versus time for tracing for the
x, y and z direction, respectively, for the same trial and subject. Unlike in the
tracking condition, the gaze position is not a smooth line and does not impose
very well on the nger position. Saccades are clearly seen as steps. Furthermore,
it is clear that gaze position signicantly leads nger position (see gure 4.3d-
e). As in gure 4.3c it can be seen in gure 4.3f that the variance in the depth
direction is very large. The jumps in the depth direction are much smaller than
in the frontal plane, and peaks can be seen which correspond with large saccades
in the frontal plane.
18
20 15 45
Gaze Gaze Gaze
15 Finger Finger Finger
10 40
10
5 35
5
position (cm)
position (cm)
position (cm)
0 0 30
−5
−5 25
−10
−10 20
−15
−20 −15 15
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
time (bins) time (bins) time (bins)
(a) Tracking x-axis (b) Tracking y-axis (c) Tracking z-axis
25 15 45
Gaze Gaze Gaze
20 Finger Finger Finger
10 40
15
10 5 35
position (cm)
position (cm)
position (cm)
5
0 30
0
−5 −5 25
−10
−10 20
−15
−20 −15 15
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
time (bins) time (bins) time (bins)
(d) Tracing x-axis (e) Tracing y-axis (f) Tracing z-axis
Figure 4.3: Finger and Gaze position versus time for tracking (panel a-c) and
1
tracing (panel d-f ). The time axis is plotted in bins, which each are
120 seconds
(8.3 ms).
4.4 Gaze-nger lead time

The lag between gaze position and nger position can be quantied by calcu-
lating the cross-covariance function between the gaze position and the nger
position. The cross-covariance function has a maximum close to the origin, but
is slightly shifted. This shift represents the lag time.
Figure 4.4 shows the gaze-nger lag time in ms for all trials. Trials with an
absolute lead time greater than 500 ms were ignored. Moreover, trials where the
nger position was not symmetrical by more than 200 ms (see gure 4.5) were
also removed. In the tracking condition the lag of the nger with respect to the
gaze position for the frontal plane (x-y plane) are not signicantly dierent from
0. However, in depth (z axis) the nger position leads the gaze position. This
indicates that the gaze does not have to be foveated for the nger to accurately
follow the stimulus. All ignored trials are shown in grey in gure 4.4.
Table 4.1 shows the mean values and standard deviations of the lag for the
x, y and z directions. The mean values were calculated by averaging over all
trials and all subjects.
For the tracing condition Table 4.1 shows that in the frontal plane the gaze
position leads the nger position by about 260 ms. During tracing the eye makes
saccades which xate on future nger positions until the nger has reached that
position to help lead the nger. After that, the eyes xate on a point further on
19
Tracking Tracing
500 500
X 0 0
−500 −500
5 10 15 20 25 30 35 40 10 20 30 40 50 60 70
500 500
Y 0 0
−500 −500
5 10 15 20 25 30 35 40 10 20 30 40 50 60 70
500 500
Z 0 0
−500 −500
5 10 15 20 25 30 35 40 10 20 30 40 50 60 70
Figure 4.4: The gaze-nger lead times for all trials are shown on the y-axis,
with each trial on the x-axis. The left column is for tracking, the right column
is for tracing. Trials shown in grey were ignored in the determination of the
gaze-nger lead times. Note the much large deviation in the Z direction. A
negative value means that gaze leads nger. Trials were the stimuli were only
presented in the frontal plane have a lead time of zero in the z-direction. This
explains the 'missing trials' in the bottom panels.
X Y Z
Tracking −18 ± 30 ms −49 ± 57 ms +83 ± 137 ms
N = 31 N = 31 N = 10
Tracing −274 ± 82 ms, −264 ± 92 ms, −104 ± 230ms
N = 52 N = 52 N = 35
Table 4.1: Mean lag between gaze and nger position. and number of trials
used to calculate the lag. Negative values indicate that the gaze leads. These
values were obtained by averaging over all trials and all subjects.
20
15
4.6 s
5.2 s
A
10 C
5
z (cm)
−5
−10
B
−15
10 15 20 25 30 35
t (s)
Figure 4.5: This gure shows the asymmetry in one of the trials. Note that the
distance between peak A and peak B is 4.6 s as opposed to the distance between
peak B an peak C which is 5.2 s, while both should be approximately be 5 s.
This asymmetry of 600 ms makes it impossible to calculate a reliable lead time
from the cross-covariance because the lead of peak B will shift the maximum of
the cross-covariance function to the left and thus produce a too large gaze-nger
lead time.
the path. In the tracking condition this is not possible because the future target
position is not known. For the depth direction the gaze also leads the nger.
However, vergence is much slower thus the lead time for the gaze is much less
than the lead time in the frontal plane.
21
Chapter 5
Discussion and conclusion

To gain more insight in eye movements during hand movements in three di-
mensions and the coordination between eye and hand movement, the binocular
eye movements and nger movements of subjects were measured in a series of
trials. In particular we investigated how the coordination between eye and n-
ger movement diers in dierent directions (in the frontal plane and in depth)
during tracking and tracing conditions. This was accomplished by determining
the lead time of gaze position relative to nger position in three dimensions.
When tracking a moving target in three dimensions nger position follows gaze
position almost directly with very little or no lag in the frontal plane. In depth
the gaze position lags behind the nger position by about 40 ms. For tracing a
completely visible path gaze always leads nger position by signicant time. In
the frontal plane gaze leads nger position by about 270 ms and in depth gaze
leads about 100 ms.
5.1 Calibration
Calibration of the scleral coils was accomplished by training a parametrized
self-organizing map. This relatively new approach was chosen because of its
more robust and accurate estimation of the gaze position relative to a conven-
tional method such as the geometric regression calibration method or to regular
feed-forward neural networks. Calibration with a PSOM showed a signicant
improvement in the calibration performance. Especially in the depth direction
where conventional methods have diculty to accurately estimate gaze position
at relatively great distances from the eyes because vergence angles become very
small at those depths. The calibration with the PSOM proved to be much more
robust and accurate in the depth direction with much less deterioration of ac-
curacy at relatively great depths. In this study a calibration trial of 27 points
22
arranged in a cube were used. If more points are used, for example a cube with
4×4×4 points for 64 points total, more information is given to the interpolation
function used during the PSOM calculations which should signicantly increase
the accuracy of the calibration.
5.2 Tracking
The results show that during the tracking condition the gaze-nger lead time in
the frontal plane is very small,−18±30 ms and −49±57 ms for the horizontal and
vertical direction respectively. The large standard deviation in the measurement
eectively means that there is no signicant lag between gaze and nger.
The time between generating a motor command in the motor cortex and the
actual eye and hand movement is about 10 ms and 120 ms respectively. For
the gaze position and the nger position to simultaneously move to the same
position the motor command can not be executed at the same time. The results
imply that motor commands to the hand precede the motor commands to the
eye by about 110 ms.
For tracking in the depth direction the gaze lags behind(!) the nger position
about 80 ms. While the standard deviation is very large due to arelatively small
number of successful tracking trials with a depth component it is clear that nger
position leads gaze position. It is well known that vergence is much slower than
version while nger movement is equally fast in all directions. Thus the results
for the depth direction support the results in the frontal plane which imply that
motor commands to the hand precede motor commands to the eye by about 110
ms.
5.3 Tracing
During tracing of a completely visible path the results show that gaze position
leads nger position by about 270 ms in the frontal plane. This large lead time
is due to saccades which jump ahead on the path to anticipate hand movement.
After a saccade the gaze position stays xed at the same position until the hand
near the gaze position after which the gaze again jumps ahead to anticipate
hand position.
In depth gaze position leads hand position only by about 100 ms. This
supports the fact that vergence is much slower than version. The dierence
in lead time between the frontal plane and depth is about 170 ms. This is
comparable to tracking, for which this value is about 120 ms.
23
5.4 Conclusion
In this study we have investigated nger and gaze movements in three dimen-
sional space. Binocular eye movements and nger movements of subjects were
measured in a series of trials. All subjects were asked to track a moving target
or trace a completely visible path in three dimensions. The results show that
motor commands from the motor cortex to the eye and nger position are not
issued simultaneously but that motor command to the hand precedes those to
the eye by about 110 ms. This implies that the brain anticipates the hand
movement. Thus it is not necessary to have the target foveated to plan accurate
hand movements towards the target which leaves time to compensate for the
120 ms delay between the motor commands from the motor cortex to the arm
and the actual arm movement.
24
Bibliography
V. Chaturvedi and J.A.M. Van Gisbergen. Stimulation in the rostral pole of
monkey superior colliculus: eects on vergence eye movements. Experimental

Brain Research, 132(1):7278, 2000.
H. Collewijn, C.J. Erkelens, and R.M. Steinman. Voluntary binocular gaze-shifts
in the plane of regard: dynamics of version and vergence. Vision Research,

35(23-24):33353358, 1995.
CJ Erkelens. Fusional limits for a large random-dot stereogram. Vision research,

28(2):345, 1988.
K. Essig, M. Pomplun, and H. Ritter. A neural network for 3D gaze recording
with binocular eye trackers.International Journal of Parallel, Emergent and

Distributed Systems, 21(2):7995, 2006.
MA Frens, AJ Van Opstal, and R.F. Van der Willigen. Spatial and temporal
factors determine auditory-visual interactions in human saccadic eye move-
ments. Perception & Psychophysics, 57(6):802816, 1995.
C.C.A.M. Gielen, T.M.H. Dijkstra, I.J. Roozen, and J. Welten. Coordination
of gaze and hand movements for tracking and tracing in 3D. Cortex, 45(3):
340355, 2009.
T. Kohonen. The self-organizing map. Neurocomputing, 21(1-3):16, 1998.
K. Levenberg. A method for the solution of certain non-linear problems in least
squares. Quart. Appl. Math, 2(2):164168, 1944.
D.W. Marquardt. An algorithm for least-squares estimation of nonlinear param-
eters. Journal of the Society for Industrial and Applied Mathematics, pages
431441, 1963.
A. Savitzky and M.J.E. Golay. Smoothing and Dierentiation by simplied
Least Squares Procedures Analytical Chemistry Vol 36, No. 8 July 1964 pp
25
1672-1639| 2J Kerawala, SM. A rapid method for calculating the least squares
solution of a polynomial of degree not exceedin the fth, 1964.
J. Walter and H. Ritter. Rapid learning with parametrized self-organizing maps.
Neurocomputing, 12(2-3):131153, 1996.
26

Verslag Bachelor Stage Andres Definitief

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Verslag Bachelor Stage Andres Definitief

Загружено:

Авторское право:

Доступные форматы

Coordination of gaze and hand movements for

tracking and tracing in 3D

3.2 Experimental set-up . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.4.1 Mathematics of the PSOM . . . . . . . . . . . . . . . . . 10

4.4 Gaze-nger lead time . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Discussion and conclusion 22

tigated. Since accurately measuring 3D gaze is dicult, most studies focus on

subjects were measured in a series of trials. In particular we investigated how

accomplished by determining the lead time of gaze position relative to nger

ibrated by xating on points on a virtual three-dimensional cube. Depth was

organizing map with a 3D interpolation function. Lead times were determined

and assist in planning of nger movements in three dimensions.

movements and nger movements of subjects in a series of trials. Several stud-

However, because of the diculty of measuring gaze in three dimensions very

mensions with a relatively new approach involving parametrized self-organizing

in three dimensions by training a special kind of neural network, a parametrized

point in three dimensions. A PSOM is essentially a continous self-organizing

map which is capable of learning highly non-linear functions.

trace a completely visible path in three dimensions. Tracking produces smooth

the time lag between gaze and nger position.

normal or corrected to normal visual acuity participated in this experiment. Two

in perceiving depth in the presented anaglyph stimuli. Furthermore, all subjects

3.2 Experimental set-up

attachted to the chair was used to xate the subjects heads.

Data was measured in a right-handed Cartesian coordinate system with its

subject (see gure 3.1).

The visual stimuli were back-projected on the projection screen with an

one eye can see its respective stimulus.

To project accurate 3D stimuli the perspective transformations had to take

for the the distance to the screen.

an angle of 30º, to track the position of strobing infrared-light-emitting diodes

(markers). The Optotrak system can distinguish between multiple markers by

measuring the strobing frequency of each marker and is capable of tracking

the markers with a root-mean-square accuracy of 0.1 mm for the y- and z-

moved their head during the experiment.

in a large magnetic eld (Remmel Labs). The three orthogonal components of

accurately measured with a resolution of about 0.25º (Frens et al. [1995]).

the data-acquisition on both PCs was synchronized by sending a signal to the

with a parametrized self-organizing map (see section 3.4 for details).

during the tracking condition.

for a total of 20 trials consisting of approximately 30 minutes. More trials per

The frontal Cassini shape (Figure3.2a) was dened as

The equation for the frontal Limaçon shape (Figure3.2b) was

constructed by swapping the x- and y-axes and setting c = −24 cm.

(a) Cassini (b) Limaçon (c) Helix

coil signals to a certain gaze position on the screen. An interpolation algorithm,

usually a second or third degree polynomial, is tted to the data to approximate

from the subject.

To overcome this 3D eye tracking problem a neural network approach was

tion. Specialized, individually calibrated neural networks can be used to reduce

the error in 3D gaze-position measurement [Essig et al., 2006]. The network

used was a parametrized self-organizing map (PSOM), which is a rapid learning

variant of Kohonen's self-organizing maps [Kohonen, 1998].

In order to calibrate the gaze in 3D with a PSOM a virtual calibration cube

was constructed (see gure 3.3) . It consisted of three planes on Z = 10, Z = 25

and Z = 40 cm a each containing 3 × 3 points with a distance of 15 cm resulting

next point is highlighted until all points have been shown.

A self-organizing map (SOM) is an articial neural network that is able to pro-

duce a lower-dimensional discretized representation of the input space. SOMs

The ability to map high-dimensional spaces to lower-dimensional spaces

neuron in a 'neuron lattice' (Figure 3.4) instead of a continuous output. For

applications where relatively high-dimensional spaces with a high resolution are

a neuron. With a conservative resolution of 1 mm in all directions this would

result in a staggering 30 million neurons. Furthermore, SOM learning requires

4.4 Gaze-nger lead time . . . . . . . . . . . . . . . . . . . . . . . . . 19

tigated. Since accurately measuring 3D gaze is dicult, most studies focus on

accomplished by determining the lead time of gaze position relative to nger

ibrated by xating on points on a virtual three-dimensional cube. Depth was

and assist in planning of nger movements in three dimensions.

movements and nger movements of subjects in a series of trials. Several stud-

However, because of the diculty of measuring gaze in three dimensions very

the time lag between gaze and nger position.

attachted to the chair was used to xate the subjects heads.

subject (see gure 3.1).

in a large magnetic eld (Remmel Labs). The three orthogonal components of

The frontal Cassini shape (Figure3.2a) was dened as

usually a second or third degree polynomial, is tted to the data to approximate

was constructed (see gure 3.3) . It consisted of three planes on Z = 10, Z = 25

A self-organizing map (SOM) is an articial neural network that is able to pro-

thousands of training samples which makes it impossible to collect sucient

coordinate depends mainly on the divergence. The dierences in divergence are

to neurons that are neighbors in A are perceived to have a specic neighborhood

It can be quite dicult or impossible to nd a suitable set of basis functions

dimensions. For t = 0 the current nger or stimulus position can be used.

If no nger position is available the center of the calibration cube is a good

a much more robust and ecient method, the Levenberg-Marquardt-Algorith m

by calculating the cross-covariance of the ltered gaze position and unltered

ventional method by tting a third order polynomial to a calibration trial and

Calibration by the conventional geometric regression method consists of rst

tting a number of predened points in a calibration trial with a third order

z(x)-plot (gure4.1b, left-bottom panel). In this example the geometric regres-

4.4 Gaze-nger lead time

eye movements and nger movements of subjects were measured in a series of

feed-forward neural networks. Calibration with a PSOM showed a signicant

where conventional methods have diculty to accurately estimate gaze position