Вы находитесь на странице: 1из 394

Gennady Andrienko · Natalia Andrienko

Peter Bak · Daniel Keim · Stefan Wrobel

Visual
Analytics of
Movement
Visual Analytics of Movement
Gennady Andrienko · Natalia Andrienko
Peter Bak · Daniel Keim · Stefan Wrobel

Visual Analytics
of Movement

13
Gennady Andrienko Peter Bak
Natalia Andrienko IBM Research
Stefan Wrobel Haifa
Fraunhofer IAIS Israel
Sankt Augustin
Germany Daniel Keim
University of Konstanz
and Constance
Germany
University of Bonn
Bonn
Germany

ISBN 978-3-642-37582-8 ISBN 978-3-642-37583-5  (eBook)


DOI 10.1007/978-3-642-37583-5
Springer Heidelberg New York Dordrecht London

Library of Congress Control Number: 2013936969

© Springer-Verlag Berlin Heidelberg 2013


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts
in connection with reviews or scholarly analysis or material supplied specifically for the purpose of
being entered and executed on a computer system, for exclusive use by the purchaser of the work.
Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright
Law of the Publisher’s location, in its current version, and permission for use must always be obtained
from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance
Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


To our families
Preface

In every sense of the word, movement is a central and fundamental aspect of life.
Indeed, being able to move still figures prominently in most definitions of ani-
mal life in biology. Movement has also been a central element in the evolution
of species and their conquering the planet and in the development of our society.
In those parts of human history that we know about, movement of individuals
and civilizations has helped shape our societies and culture. Arguably, however,
the most revolutionary changes brought about by movement to the lives we know
today have been happening in the past 150 years: with a particular kind of move-
ment that we call traffic. Beginning with the first railway connections, followed by
the invention of the automobile and the first airplanes, moving people and goods
from one place to another has become easier and faster not by one, but by several
orders of magnitude. Consequently, the capacity of an economy for movement,
often referred to as mobility, is now considered to be a crucial enabler for indus-
trial development and prosperity.
Not surprisingly, many of the important planning decisions in society and busi-
ness depend on proper knowledge about and a correct understanding of movement.
Should a region or a country invest billions of euros into a new airport or into
a new train station? Where can wind parks for power generation be placed with-
out affecting ship traffic and the movement of animals? How should urban quar-
ters or shopping malls be structured so that the needs of pedestrians can perfectly
be met? How should national parks be laid out to best protect animals and their
movements? Where should new stores and logistics centers be set up to best reach
customers and minimize cost? How do diseases spread, which may put lives in
danger? How can a stadium or large building be evacuated quickly and safely?
The above are just a few examples of the scientific, societal, and commer-
cial questions that relate to movement, and none of them can be answered with-
out deep and well-structured knowledge of the movement patterns of people and
objects. Yet, surprisingly, until very recently, many of those decisions were made
based on common sense knowledge only, often relying on general rules of thumb
and prior beliefs about how people and objects would be moving. Empirical stud-
ies were extremely expensive, and the tools for their analysis complex, resulting in
limited availability of precise knowledge and as a result, in many costly decisions
that did not bring the intended benefit or even worsened an existing situation.

vii
viii Preface

Fortunately, recent technological advances have completely changed the game


of knowing about location and finding out about movement. The technical founda-
tions were created quite a while ago, with Global Positioning System (GPS), cellu-
lar mobile telephony, and Radio-frequency identification tags (RFID). In the GPS,
the receiver can localize itself and record its position with high precision based
on the system's satellite signals. A cell-based mobile phone can be localized in its
individual cell or even more precisely by measuring its radio signals. RFID are
so cheap that they can be placed on objects which can then be recognized when
passing stationary receivers. In addition, wireless networks today can be used to
localize objects very precisely, and small sensors are available for special purposes
such as tracking animals.
The true revolution, however, has been in the extremely widespread adoption
of such technology as it has been brought about by mobile phones. With sev-
eral billion mobile phones in use today, most people today possess the technol-
ogy necessary to localize themselves and record the movement if they so wish.
RFID and other technologies are so widely deployed now that unsurprisingly,
location and mobility data are considered to be the fastest-growing type of data
today. According to a recent study by the McKinsey Global Institute (Manyika
et al. 2011), the amount of mobility data available in 2009 was estimated to be one
petabyte, and it is safe to assume that today, we are rather looking at exabytes or
maybe zetabytes of mobility data.
While this deluge of data promises to contain the needed information to arrive
at empirically well-founded models and decisions about mobility and movement,
current practice shows that working with the available data often does not lead to
insight, but rather to confusion and frustration. Since when talking about move-
ment and mobility, we usually are not interested in the historical whereabouts of
a single individual or object, the large data volumes by themselves only make us
fail to see the forest for the trees. Moreover, simple classical means of visually
inspecting movement data fail catastrophically when used with extremely large
data volumes without further changes, since even the highest resolution displays
cannot show millions or even billions of movements at the same time without
completely cluttering the display. Algorithmic approaches to processing move-
ment data are thus sorely needed to reduce the data volume by aggregation and
selection, and to bring out the important properties. At the same time, due to the
very nature of such algorithmic methods, they can be used to ensure the privacy
of individual movement, since detail about individual movement is not needed in
the condensed model and thus stripped away. Such algorithmic methods, however,
have turned out to be difficult to control for analysts if used in isolation.
What is needed, therefore, are new methods of visualization and new methods
of algorithmic data analysis that are combined in such a way that they tightly inte-
grate and complement each other to allow end-users and analysts alike to work
with extremely large volumes of movement data in as simple a way as they would
have with simpler models of the past. And this is exactly where this book comes
in. The book is concerned with the science, technology, and the software of doing
visual analytics for movement data, i.e., using visual and algorithmic approaches
Preface ix

in an integrated and interactive fashion. The science of visual analytics has been
developing rapidly over the past years, and the paradigm of tightly intertwining
visualization and algorithmic analysis has proven a breakthrough for many data
analysis tasks. As this book shows, visual analytics techniques today are ready to
even tackle the enormous challenges brought about by movement data, and there
is technology and software available for use right at this moment.
This book is about the exciting possibilities created by visual analytics for any-
one interested in understanding movement, analyzing movement, or simply make
decisions that are influenced by the way people, animals, and objects move. We
start out with an introduction that illustrates the different kinds of data that are
available to describe movement, from single trajectories of single objects to mul-
tiple trajectories of many objects, and then proceed to a conceptual framework,
which provides the basis for a fundamental understanding of movement data. The
book then moves on to more practical and technical aspects, focusing on how
exactly to transform movement data to make it more useful, and on the infrastruc-
ture necessary for performing visual analytics in practice. We then illustrate that
visual analytics of movement data can bring exciting insights into the behavior of
moving persons and objects, but can also lead to an understanding of the events
that happen when things move. Indeed, visual analytics techniques can be used
to even turn around the analytical questions in order to derive characteristics of
the underlying space or characteristics of time from movement data. Throughout
the book, we use application examples from various domains to show what can be
done in practice, and always illustrate the examples with graphical depictions of
the interactive displays and the analysis results.
In summary, we hope that the book will make a useful and entertaining read-
ing for anyone interested in movement and the possibilities of visual analytics in
this field. Researchers will find the necessary scientific precision, software tech-
nologists will find the necessary information on algorithms and systems, and
practitioners will find readily accessible examples with detailed illustrations for
practical purposes. Enjoy!

Reference

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung, A. (2011). Big
data: The next frontier for innovation, competition, and productivity. McKinsey Global
Institute.
Acknowledgments

The material of this book results from collaborative research over a long period.
The work has been mostly done within a series of research projects financially
supported supported by the European Commission, DFG (German Research
Foundation) and BMBF (German Federal Ministry of Education and Research):
• European FET-Open projects GeoPKDD (Geographic Privacy-aware Knowledge
Discovery and Delivery, 2005–2009, http://www.geopkdd.eu/), DATA SIM (DATA
science for SIMulating the era of electric vehicles, 2011–2014, http://www.datasim-
fp7.eu/), and LIFT (Using Local Inference in Massively Distributed Systems, 2010–
2013, http://www.lift-eu.org/);
• DFG Priority Research Program on Visual Analytics (SPP 1335, 2008–2014,
http://www.visualanalytics.de/);
• BMBF project VASA (Visual Analytics in Security Applications, 2011–2014,
http://www.va-sa.net);
• European coordination actions VisMaster (Visual Analytics–Mastering the
Information Age, 2008–2011, http://www.vismaster.eu/) and MODAP (Mobility,
Data Mining and Privacy, 2010–2013, http://www.modap.org/), and COST
Action MOVE (Knowledge Discovery from Moving Objects, 2010–2014,
http://www.move-cost.info).
The project GeoPKDD deserves special acknowledgment as an initiator of our
intensive systematic research focused on the phenomenon of movement. Together
with the partners, we developed an understanding of what movement is and how
it can be analyzed. The project provided a platform for fruitful collaboration and
synergies between visual analytics, database science, and data mining. The col-
laboration with our GeoPKDD partners is continuing, bringing new interesting
results.
The book would also be impossible without the support of the German DFG
Priority Research Program on Visual Analytics. In particular, it allowed the book
authors to work together within the project ViAMoD (Visual Spatiotemporal
Pattern Analysis of Movement and Event Data). The idea of the book was born,
developed, and realized during this project. The Priority Research Program also
stimulated cooperation between projects, which resulted in getting new ideas and
developing new methods together with participants of other projects.

xi
xii Acknowledgments

We would also like to especially acknowledge the important role of the


European COST Action MOVE, which allowed us not only to establish fruitful
contacts and collaborations with many European researchers working on analysis
of movement but also to meet experts from several application domains (maritime
traffic, animal ecology, human geography, city planning), learn about real-world
tasks and problems related to movement analysis, and get access to interesting and
challenging real-world datasets.
We are grateful to all our project partners for collaboration, exchange of ideas,
critical discussions, and for their contributions, which extended our knowledge
and research scope, stimulated conceptual thinking, and inspired ideas of new
methods. We thank D. Pedreschi, F. Giannotti, S. Rinzivillo, A. Monreale, M.
Nanni and C. Renso (CNR Pisa and University of Pisa, Italy), Y. Theodoridis,
N. Pelekis and I. Kopanakis (University of Piraeus, Greece), A. Raffaetà,
L. Leonardi, and C. Silvestri (University of Venice, Italy), S. Spaccapietra,
C.Parent and J.Macedo (EPFL, Switzerland), M.Wachowicz, D. Orellana and A.
Ligtenberg (WUR, The Netherlands), Y. Saygin (Sabanchi University, Turkey), J.
Koehlhammer (Fraunhofer IGD, Germany), M.-L. Damiani (University of Milan,
Italy), B. Kuipers, V. Bogorny, D. Yanssens and L. Knapen (Hasselt University,
Belgium), S. van der Spek (TU Delft, The Netherlands), R. Weibel, P. Laube and
R. Purves (University of Zürich, Switzerland).
We have also collaborated with many people beyond the funded research pro-
jects. The collaboration was very fruitful for generating, shaping, and developing
new ideas and in many cases involved joint development of new analytical meth-
ods or procedures, finding synergies between different approaches, or trying previ-
ously developed methodologies on new challenging data and analysis tasks. We
thank our informal collaborators H. Schumann and C. Tominski (University of
Rostock, Germany), T. von Landesberger, T. Schreck, and S. Bremm (University
of Darmstadt, Germany), W. Kuhn (University of Münster, Germany), N. Willems,
R. Scheepens, and J. van Wijk (TU Eindhoven, The Netherlands), D. Weiskopf,
M. Bursch, D. Thom, and T. Ertl (University of Stuttgart, Germany), P. Jankowski
(Univesity of California in San Diego, USA), C. Hurter (ENAC Toulouse,
France), R. Güting and M. Sakr (University of Hagen, Germany), Z. Smoreda,
T. Couronne, C. Ziemlicki, and A.-M. Olteanu (Orange Labs R&D, France),
P. Henzi, L. Barrett and M. Dostie (University of Lethbridge, Canada), R. Ahas
(University of Tartu, Estonia), M. Heurich (Bavarian Forest National Park,
Germany), K. Ooms (Ghent University, Belgium).
Discussions and joint works with our institute and university colleagues helped
us a lot in our research. The authors are especially thankful to M. May, G. Fuchs,
K. Vrotsou, T. Liebig, H. Stange, C. Kopp, H. Voss, U. Bartling, A. Oçakli, S.
Scheider, K.-H. Sylla, V. Hernandez-Ernst, D. Hecker, M. Mock, M. Mladenov,
C. Pölitz, C. Navarra, I. Peca, H. Zhi (Fraunhofer IAIS and University of Bonn,
Germany), S. Kisilevich, F. Mansmann, D. Spretke, H. Janetzko (University
of Konstanz, Germany). A special “thank you” to A. Yaeli and H. Ship (IBM
Research Lab Haifa) for their inspiration, mentorship, and challenging questions
and comments in fruitful discussions.
Acknowledgments xiii

Commissions on GeoVisualization, Geospatial Analysis and Modelling,


Cognitive Visualization, and Use and User Issues of the International
Cartographic Association had a strong influence on the development of our ideas.
Among all the members, we are especially grateful to M.-J. Kraak, C. Blok,
C. van Elzakker and U. Turdukulov (ITC, The Netherlands), J. Dykes, J. Wood,
A. Slingsby (City University, UK), S. Fabrikant and A. Çöltekin (University of
Zürich, Switzerland), J. Schiewe (HCU, Germany), D. Dransch and M. Sips (GFZ,
Germany), U. Demšar (University of Maynooth, Ireland), M. Jern (Linköping
University, Sweden), B. Jiang (Univesity of Gävle, Sweden), G. Gartner (TU
Vienna, Austria).
Our work was greatly inspired by the world research leaders in visual analyt-
ics, geographic analysis, exploratory cartography, and interactive visualization
J. Thomas (PNNL, USA), W. Tobler (UCSB, USA), A. MacEachren (Penn State
University, USA), and B. Shneiderman (University of Baltimore, USA).
We especially cordially thank the co-authors of the methods described in the
book for their active and creative work on inventing, developing, implement-
ing, and further developing the visualization and analysis methods. Salvatore
Rinzivillo (CNR, Pisa, Italy) was the main force in developing the methods for
clustering of trajectories and spatial events. The work of Christian Tominski
(University of Rostock, Germany) stands behind the Trajectory Wall. We owe
to Tatiana von Landesberger and Sebastian Bremm (TU Darmstadt, Germany)
for the Dynamic Categorical Data View. David Spretke (University of Konstanz,
Germany) was the primary author of the Droplet Maps. The authors wish to thank
Eli Packer and Harold Ship (IBM Research Lab Haifa) for their work on the
stop analysis and flower visualization; their valuable insights and knowledge are
highly appreciated. The authors would also like to acknowledge the work of Sivan
Harary, Mattias Marder, Harold Ship, and Avi Yaeli (IBM Research Lab Haifa) on
the development of the encounter detection algorithm.
We are very indebted to the anonymous reviewers of our papers. Their con-
structive critiques and suggestive comments greatly helped us to refine and elabo-
rate our ideas and methods and to find good ways to present these to the audience.
We also personally thank T. Slocum (University of Kansas, USA) and G. Fuchs
(Fraunhofer IAIS and University of Bonn, Germany) for commenting on selected
parts of the book.
Most of the illustrations in the book have been produced using research soft-
ware V-Analytics (developed in Fraunhofer IAIS, http://geoanalytics.net/V-Analyt-
ics) and VisMap developed in the University of Konstanz and IBM Research Lab
Haifa. We encourage the readers of the book to try using the software for experi-
menting with publicly available or their own data sets.
Contents

1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 A Single Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Multiple Trajectories of a Single Object . . . . . . . . . . . . . . . . . . . . . 8
1.3 Simultaneous Movements of Many Objects . . . . . . . . . . . . . . . . . . 21
1.4 What Should Have Been Achieved by These Examples . . . . . . . . . 28
1.5 Visual Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6 Structure of The Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Conceptual Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Fundamental Sets: Space, Time, and Objects . . . . . . . . . . . . . . . . . 35
2.2.1 Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.3 Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Characteristics of Objects, Locations, and Times . . . . . . . . . . . . . . 38
2.4 Basic Types of Spatio-temporal Data. . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Event-Based View of Movement. . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Multi-Perspective View of Movement. . . . . . . . . . . . . . . . . . . . . . . 45
2.7 Spatio-temporal Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8 Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8.1 Relations of Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.8.2 Relations of Locations and Times . . . . . . . . . . . . . . . . . . 53
2.9 Movement Data and Context Data. . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.9.1 Forms and Sources of Movement Data . . . . . . . . . . . . . . 55
2.9.2 Properties of Movement Data . . . . . . . . . . . . . . . . . . . . . 56
2.9.3 Context Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.10 Example Data Sets Used in the Book . . . . . . . . . . . . . . . . . . . . . . . 59
2.10.1 Personal Driving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.10.2 Cars in Milan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.10.3 Vessels in the North Sea. . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.10.4 Public Transport in Helsinki. . . . . . . . . . . . . . . . . . . . . . . 61

xv
xvi Contents

2.10.5 A Group Walk of Workshop Participants. . . . . . . . . . . . . 61


2.10.6 Trajectories of Flickr and Twitter Users. . . . . . . . . . . . . . 62
2.10.7 VAST Challenge 2011. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.10.8 Tracks of Wild Animals in a National Park. . . . . . . . . . . 63
2.10.9 Movements of Laboratory Mice. . . . . . . . . . . . . . . . . . . . 64
2.10.10 Movements of Visitors of Car Races. . . . . . . . . . . . . . . . 65
2.11 Types of Movement Behaviours. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.12 Types of Movement Analysis Tasks. . . . . . . . . . . . . . . . . . . . . . . . . 68
2.13 Recap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3 Transformations of Movement Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73


3.1 Interpolation and Re-sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2 Division of Movement Tracks and Trajectories. . . . . . . . . . . . . . . . 74
3.3 Transformations of Temporal and Spatial References. . . . . . . . . . . 75
3.4 Derivation of New Thematic Attributes. . . . . . . . . . . . . . . . . . . . . . 79
3.5 Extraction of Spatial Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5.1 Extraction of Movement Events from Trajectories . . . . . . 82
3.5.2 Detection of Stop Events. . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5.3 Extraction of Spatial Events from Other Data Types. . . . . 86
3.6 Spatial and Temporal Generalization. . . . . . . . . . . . . . . . . . . . . . . . 86
3.7 Trajectory Abstraction (Simplification). . . . . . . . . . . . . . . . . . . . . . 88
3.8 Spatio-Temporal Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.9 Transformations Between Data Types. . . . . . . . . . . . . . . . . . . . . . . 97
3.10 Recap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4 Visual Analytics Infrastructure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


4.1 Interactive Visualizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Interactive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2.1 Spatial, Temporal, and Attribute Filtering. . . . . . . . . . . . . . . 114
4.2.2 Filtering of Object Classes and Individual Objects. . . . . . . . 117
4.2.3 Filtering of Trajectory Segments. . . . . . . . . . . . . . . . . . . . . . 118
4.2.4 Filtering of Related Object Sets. . . . . . . . . . . . . . . . . . . . . . . 121
4.3 Dynamic Aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.4 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5 Visual Analytics Focusing on Movers . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


5.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.1.1 Spatial Summarization of Trajectories. . . . . . . . . . . . . . . . . . 133
5.1.2 Clustering of Trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.1.3 Visualization of Positional Attributes. . . . . . . . . . . . . . . . . . . 165
5.1.4 Analysis of Multiple Positional Attributes. . . . . . . . . . . . . . . 170
Contents xvii

5.2 Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172


5.2.1 Encounters Between Moving Objects . . . . . . . . . . . . . . . . . . 173
5.2.2 Relations in a Group of Movers. . . . . . . . . . . . . . . . . . . . . . . 180
5.2.3 Relations of Movers to the Environment. . . . . . . . . . . . . . . . 194
5.3 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

6 Visual Analytics Focusing on Spatial Events . . . . . . . . . . . . . . . . . . . . . 209


6.1 Extraction of Composite Spatial Events by Clustering . . . . . . . . . . . 211
6.1.1 A Distance Function for Spatial Events. . . . . . . . . . . . . . . . . 212
6.1.2 Selection of Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.1.3 Scalable Clustering of Events. . . . . . . . . . . . . . . . . . . . . . . . . 214
6.1.4 An Example of Scalable Clustering of Spatial Events. . . . . . 218
6.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.2.1 Growth Ring Maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.2.2 Flower Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.2.3 Textual Characteristics of Composite Events. . . . . . . . . . . . . 232
6.3 Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
6.3.1 Spatio-Temporal Relations Between Events . . . . . . . . . . . . . 239
6.3.2 Relations Between Events, Trajectories, and Context. . . . . . 240
6.4 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

7 Visual Analytics Focusing on Space. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253


7.1 Obtaining Places of Interest from Movement Data. . . . . . . . . . . . . . 254
7.1.1 Space Tessellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7.1.2 Grouping of Close Locations. . . . . . . . . . . . . . . . . . . . . . . . . 257
7.1.3 Event-Based Place Extraction . . . . . . . . . . . . . . . . . . . . . . . . 258
7.1.4 Extraction of Personal Places. . . . . . . . . . . . . . . . . . . . . . . . . 259
7.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.2.1 Visualization of Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.2.2 Transformations of Time Series. . . . . . . . . . . . . . . . . . . . . . . 263
7.2.3 Clustering of Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.2.4 Time Series Modelling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.2.5 Event Extraction from Time Series . . . . . . . . . . . . . . . . . . . . 274
7.2.6 Interpretation of Personal Places. . . . . . . . . . . . . . . . . . . . . . 279
7.3 Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
7.3.1 Analysis of Binary Links Between Places. . . . . . . . . . . . . . . 283
7.3.2 Relations Between Link Attributes. . . . . . . . . . . . . . . . . . . . . 287
7.3.3 Relations Between Several Places. . . . . . . . . . . . . . . . . . . . . 291
7.3.4 Discovery of Frequent Sequences . . . . . . . . . . . . . . . . . . . . . 296
7.4 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
xviii Contents

8 Visual Analytics Focusing on Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307


8.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
8.1.1 Clustering of Times by Similarity of Spatial Situations. . . . . 309
8.1.2 Event Extraction from Spatial Situations. . . . . . . . . . . . . . . . 319
8.2 Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
8.3 Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

9 Discussion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335


9.1 Multi-Perspective View of Movement and Task Typology . . . . . . . . 335
9.2 Properties of Movement Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
9.2.1 Temporal Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.2.2 Spatial Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.2.3 Mover Set and Mover Identity Properties . . . . . . . . . . . . . . . 344
9.2.4 Data Collection Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.3 General Procedures of Movement Analysis. . . . . . . . . . . . . . . . . . . . 352
9.4 Movement in Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9.4.1 Visual Tools for Observation of Relations. . . . . . . . . . . . . . . 355
9.4.2 Computational Enhancement to Observation
of Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
9.4.3 Extraction of Relation Occurrences. . . . . . . . . . . . . . . . . . . . 360
9.4.4 Support of Analytical Reasoning. . . . . . . . . . . . . . . . . . . . . . 361
9.5 Movement Behaviours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
9.6 Personal Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
9.7 Future Perspectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
9.8 Suggested Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
9.9 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375

Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Chapter 1
Introduction

Abstract This chapter provides an informal introduction of the main concepts


related to analysis of movement. The concepts are introduced by illustrated exam-
ples, which also demonstrate some techniques that may be used for visual explora-
tion and analysis of movement data. The examples show how the capabilities of
the computer and human can be combined to extract knowledge from movement
data. This sets the stage for introducing the concept of visual analytics. The chap-
ter also explains the objectives and the structure of the book.

Let us begin by considering a simple example of movement data got from a person
who installed a GPS device in her car to record the geographical positions of the
car as it moves. Figure 1.1 demonstrates what the position records made by the
device look like. The main components of the records are the geographical coordi-
nates (X denotes the longitudes and Y the latitudes) and the times when the posi-
tions were measured; we shall call them “timestamps”. This is the most typical
structure of position records. In a general case, the coordinates are not necessarily
geographical and the timestamps do not necessarily consist of Gregorian calendar
dates and times of the day. Generally, movement data include positions of some
moving objects in a certain space: the geographical space, the entire space of the
universe, the internal space of a building, the space of a football field, etc. The
positions may be represented by coordinates in a suitable spatial reference sys-
tem. The timestamps of the position records may be expressed in any system of
temporal measurement. These may be, in particular, relative times, such as counts
of seconds that passed from the beginning of the observation. The precision of
timestamps may also vary: nanoseconds, hours, days, years, centuries, etc. The
sequence of position records representing the movement of one object is called the
movement track.
Movement data represent paths of moving objects through space over time.
These paths are usually continuous, that is, a moving object occupies a certain
spatial position at any time moment. However, for technical reasons, movement

G. Andrienko et al., Visual Analytics of Movement, 1


DOI: 10.1007/978-3-642-37583-5_1, © Springer-Verlag Berlin Heidelberg 2013
2 1 Introduction

Fig. 1.1  An example of
movement data: GPS-
measured positions of a car

data are discrete. They reflect the spatial positions only at some time moments.
There is an inevitable uncertainty concerning the positions of the moving objects
in the times between the timestamps. The longer the intervals between the times-
tamps are, the higher the uncertainty.
The path of a moving object made during the whole time of its existence or
movement observation is usually divided into meaningful parts, called trajectories.
For example, a trajectory may represent a single trip of a person or person’s move-
ment during one day. A trajectory of a migratory animal may represent its move-
ment during one migration season. These examples demonstrate that movement of
an object may be divided into trajectories in many different ways. The choice of a
suitable division depends on the nature of the movement and goals of the analysis.

1.1 A Single Trajectory

Let us return to the GPS track of the personal car. The car owner used the posi-
tioning device for almost one year, although not every day, and collected 112,890
position records. We shall take one of the days of the observation period and
consider the records made during this day. Thus, on 24 April 2007, the device
recorded 458 positions; a few of them are shown in Fig. 1.1. Can we extract mean-
ingful information from these records?
Of course, not much can be gained by just looking at the recorded numbers.
However, since these are geographical positions, they can be represented on
1.1  A Single Trajectory 3

Fig. 1.2  The trajectory of the car represented by a line on a map. Here and throughout the book,
we use a cartographical background from OpenStreetMap (http://www.openstreetmap.org/)

a map, which will provide us with the spatial context and aid in interpreting the
data. Figure 1.2 demonstrates a map in which the trajectory of the car on 24 April
2007 is represented by a line. The first position of the trajectory is marked by a
small hollow square and the last position by a filled square; both squares are on
the north close to each other. Note that the line is a continuous representation of
the discrete data. It has been constructed by connecting consecutive positions by
straight line segments. Since the time intervals between the position records are
quite short, this does not introduce much error. We can see that the reconstructed
car trajectory quite nicely fits in streets represented in the map.
The map shows us that the trajectory corresponds to a round trip. The start and
end positions, marked by the hollow and filled squares, respectively, are approxi-
mately in the same place on the north of the area. We can identify the geographical
location of the trip and the streets that were used. However, we cannot learn much
more about this trip using only the map. The main problem is that the map does
not show us the temporal component of the data. As a result, we cannot determine
4 1 Introduction

whether the car moved clockwise or counter-clockwise and whether it stopped on


the way. Hence, we need a display that represents not only the space but also the
time, such as the space–time cube in Fig. 1.3. This is a perspective view of a three-
dimensional representation where the two horizontal dimensions represent the
space and the vertical dimension, the time. The temporal axis is oriented from the
bottom to the top; the represented time interval is from 09:47:55 till 19:42:29 on
24 April 2007, as can be seen at the bottom of the cube.
The space–time cube gives us additional information. We see that the car
moved clockwise and that there was a long stop, which is indicated by a long ver-
tical line segment. Generally, a vertical line segment in a space–time cube means
that the spatial position remained the same during a time interval.
To explore parts of the trajectory in more detail, we can apply temporal focus-
ing, which limits the time interval of the data that are visible in the displays. Thus,
Fig.  1.4 contains two screenshots of the space–time cube display (top) and two
screenshots of the map display (bottom). The screenshots on the left show the part
of the trajectory made during the first 12 min of the trip. The screenshots on the
right represent the last 50 min of the trip. The cube has been rotated so that it is
viewed from the east, that is, the south is on the left and the north on the right. On
the left image of the space–time cube, we can detect short stops (vertical line seg-
ments), which, probably, occurred at street crossings. On the right image, we see a
quite long stop.

Fig. 1.3  The trajectory of the car is represented by a line in a space–time cube


1.1  A Single Trajectory 5

Fig. 1.4  Temporal focusing in the space–time cube display (left) and map display (right). Upper
part the first 12 min of the trip. Lower part the last 50 min of the trip

Although the perspective view of the space–time cube can be interactively


zoomed, rotated, and moved, it is not very convenient for locating trajectory segments
in space and in time. Thus, it is not easy to see at what times and in what places the
stops occurred and how long they lasted. There is no ideal display showing space and
time together. Therefore, it may be reasonable to use two displays: the map, which is
good at conveying the spatial information, and some other display that would be good
at conveying the temporal and time-dependent information, such as the time graph
in Fig. 1.5. The horizontal axis of the display represents the temporal range of the
data. The vertical axis can represent the values of any time-dependent numeric attrib-
ute. This may be an attribute computed from the trajectory, such as movement speed,
direction, travelled distance, distance to a particular location. The label of the vertical
axis (i.e. the attribute name) is shown in the top left corner of the display.
The time graph in Fig. 1.5 represents the cumulative path length (travelled distance)
from the starting point of the trajectory. The stops appear in this display as horizontal
lines. The labels along the time axis allow approximate temporal positioning. Thus,
for the longest stop, we can see that the car stopped at about 10 in the morning and
resumed the movement at about 19 o’clock. When we move the mouse cursor over the
graph area, the time corresponding to the mouse position is shown above the graph.
Hence, by mouse-pointing on the left and right ends of the horizontal line, we can
6 1 Introduction

Fig. 1.5  A time graph representing the cumulative path travelled by the car

Fig. 1.6  A time graph representing the variation in the instant speed of the car during the first
12 min of the trip

ascertain the start and end times of the stop more precisely: 09:58 and 18:52. The stop
visible in the time-focused space–time cube in Fig. 1.4 (top right) appears as a hori-
zontal line segment at the right end of the graph. We can find out that the stop occurred
from 18:59 to 19:36. Besides, we can learn that the total length of the trajectory is
13.84 km and the longest stop occurred at 6.44 km from the beginning of the trip.
Figure 1.6 contains the time graph representing the variation in the instant speed
of the car over time. We can apply temporal focusing to explore parts of the trajec-
tory in more detail. In Fig. 1.6, we have focused on the first 12 min of the trip. We
see that there were two short time intervals of constant very low speed manifested
1.1  A Single Trajectory 7

by horizontal line segments at the bottom of the graph. They correspond to the two
vertical line segments in the time-focused space–time cube in Fig. 1.4 (top left),
which mean that the car was not really moving during these time intervals. One
could expect that the corresponding speed is zero; however, the graph shows positive
(although very low) speed values. The reason is that position measurements are never
absolutely accurate, and neither is the instant speed computed from them. Hence, in
analysing movement data, one should not expect that stops will be always manifested
by zero speed values but rather should take a reasonable threshold, that is, a minimum
value such that all speed values below it are considered as absence of movement.
The time graph alone is insufficient for exploring spatio-temporal data since it
does not represent the spatial aspect. Thus, we can easily find out when the stops
occurred in time, but we do not know where they occurred in space. We need a
link between the time graph and the map, like the one demonstrated in Fig. 1.7.

Fig. 1.7  Investigating the speed variation in time and space using dynamically linked map and
time graph
8 1 Introduction

When we put the mouse cursor on some point of the line in the time graph, the
corresponding spatial position is marked on the map by a cross formed by the
intersection of two lines, horizontal and vertical. The screenshots on the top and
in the middle of Fig. 1.7 show how we determine the spatial positions of the two
short stops at the beginning of the trip (only the tip of the mouse cursor is visible
at the bottom of each image). On the bottom left, we have put the mouse cursor on
the point in the time graph corresponding to the highest speed value attained dur-
ing the first 12 min of the trip (the text above the time graph says that the value vas
86.41 km/h and that it occurred at 09:55:22). On the bottom right, the cross cursor
shows us the respective spatial position. The link between the two displays works
also in the opposite direction: when we put the mouse cursor on some point of the
trajectory represented on the map, the corresponding temporal position is shown in
the time graph by a yellow vertical line.
Unfortunately, a single trajectory does not give us much information. We can
guess that the person drove from home to work in the morning, stayed in the
working place until evening, and then drove home by another route in order to
visit some place on the way, probably, a shop. However, the single trajectory does
not give us enough data to check our guesses.

1.2 Multiple Trajectories of a Single Object

Let us now consider the whole dataset that we received from the car owner. This
is a single very long sequence of position records. It is not feasible to explore the
data in fine detail as we did with the one-day trajectory. If we try to represent all
the data in visual displays in the same way as we did for the small subset before,
we discover that the displays are not very useful.
For example, Fig. 1.8 presents a fragment of the map display (top) and the
space–time cube (bottom). The overlapping lines and the visual clutter do not
allow any useful findings. The time graph looks even worse: the whole time span
of the data is 27,021,154 s (i.e. 450,354 min or 7,506 h), and there are not enough
pixels on the screen to represent the temporal variation in movement attrib-
utes with a reasonable resolution. The movement during a whole day has to be
squeezed into just two or three pixels along the horizontal dimension represent-
ing time. Of course, temporal focusing, as in Fig. 1.6, is applicable; however, we
would have to consider, for example, 7,506 hourly intervals, which would be very
time-consuming and tiresome. This shows the limitations of purely visual and
interactive techniques. Also, note that this dataset is not really big. Much larger
amounts of movement data are more usual. Very often the datasets are so large that
they cannot even be fully loaded in the computer’s main memory.
To explore large datasets, we need to involve to a greater extent the power of
computers. Interactive visualization needs to be combined with computational pro-
cessing and/or database operations. One of the common approaches for dealing
with large datasets is computational aggregation.
1.2  Multiple Trajectories of a Single Object 9

Fig. 1.8  A fragment of the map display (top) and the space–time cube display (bottom) repre-
senting the whole dataset with the positions of the car

For example, in Fig. 1.9, the car movement data have been spatially aggregated
into flows representing the intensity of the movement. For this purpose, the ter-
ritory has been divided into compartments. For each pair of compartments, the
number of times the car moved from the first to the second compartment has been
counted. The results of the aggregation are represented in the map by half-arrow
symbols with the widths proportional to the counts. The symbols point in the
direction of the movement. This is done in a generalized way, that is, the orienta-
tion of a symbol does not necessarily coincide with the actual heading of the car
in the respective place but is an aggregate of multiple headings of the car and rep-
resents the major movement direction. The half-arrow rather than full-arrow sym-
bols are used in order to show flows in two opposite directions.
10 1 Introduction

Fig. 1.9  A flow map representing the movements of the car in an aggregated form

As compared to the map in Fig. 1.8, the flow map in Fig. 1.9 gives us more infor-
mation. We see where the car owner moved frequently and where the car owner moved
occasionally. The part of the territory where the car owner moved most frequently
is shown in more detail in Fig. 1.10. The thickest symbol represents 172 moves (i.e.
times when the car owner drove through the respective place). The corresponding sym-
bol oriented in the opposite direction represents 149 moves. In some places, we see
that the car owner moved significantly more often in one direction than in the other.
When we explored a one-day trajectory of the car owner in Sect. 1.1, we
guessed about the purposes of the trip and the meanings of the places of the start/
end and the stops, but we were not much confident of our guesses. Analysing the
movement over a longer time period may allow us to come to more definite con-
clusions. First of all, we would like to find out the significant places of the car
owner, that is, the places of her home, work, regularly visited shops, and, possibly,
places of other frequent activities. A place significant for a person can be recog-
nized from the number and duration of stops. Home and work places are places
where a person usually stops often and stays for quite a long time. Hence, to find
these places, we need to retrieve the positions of long stops, say, 3 h or longer.
In our data, a stop is manifested by a temporal gap between consecutive position
records since the recording was only done when the car actually moved. This may
be different in other data: stops may appear as sequences of records with very low
speed values, or consecutive spatial positions may form a dense spatial cluster
(remember that position measurements are never perfectly accurate; several meas-
urements taken in the same point in space typically do not coincide).
1.2  Multiple Trajectories of a Single Object 11

Fig. 1.10  A fragment of the map display showing in more detail the part of the territory where
the car owner moved most frequently

Using a database operation, we retrieve the stops for 3 h or more from the car
movement data. Each stop has a certain position in space and a certain position
in time. We use the term spatial event to refer to any discrete physical or abstract
object that has a certain position in space and time. Stops are just one example of
spatial events that can be extracted from movement data. It is possible to extract
many other kinds of spatial events, such as high-speed events, acceleration events,
significant turn events, events of passing a street crossing, and so on.
12 1 Introduction

The long stop events we have retrieved from the car movement data are shown
as dots on a map in Fig. 1.11. When several stops occurred at the same place, the
dots on the map overlap. It is hard to see how many dots are in a place. To dis-
tinguish the places of frequent stops from those of occasional stops, we apply a
clustering tool, which groups the stops according to the spatial distances between
them. It detects two spatially dense clusters of stop points. The corresponding dots
are shown in red and blue. The stops that do not belong to the clusters (i.e. the car
only occasionally stopped in those places) are shown in dark grey. On the right of
Fig. 1.11, a space–time cube shows the spatial and temporal positions of the stops.
The cube is slightly rotated, so that we are viewing it from the southeast. The red
and blue clusters appear as columns formed by many dot symbols. Precisely, the
red cluster contains 220 stops, and the blue cluster contains 135 stops; 10 stops are
out of the clusters.
We can be more or less confident that the places of the clusters are the home
and work places of the person. To find out which of them is home and which is
work, we should look at the times when the person stopped there. Figure 1.12 con-
tains two two-dimensional frequency histograms showing the temporal distribu-
tion of the stops in the red and blue clusters by the days of the week and hours of
the day. The columns of the histograms correspond to the hours of the day, from
0 to 23, and the rows correspond to the days of the week, from 1 (Monday) to 7

Fig. 1.11  Left a fragment of the map display with the spatial positions of the stops for 3 or more
hours. Right a fragment of the space–time cube display with the spatio-temporal positions of the
stops
1.2  Multiple Trajectories of a Single Object 13

Fig. 1.12  The temporal distribution of the stops from the red (top) and blue (bottom) clusters by
the days of the week and hours of the day

(Sunday). Note that the vertical axis, corresponding to the days of the week, is
oriented upwards. The square symbols in the cells represent the frequencies of the
respective combinations of day and hour, so that the filled areas inside the squares
are proportional to the frequencies. The text below the histogram says what fre-
quency value corresponds to the maximal filled area, that is, the full area of each
square.
The upper histogram, which represents the red cluster, tells us that the stops
occurred on all days of the week. On the working days (from 1 to 5), the stops
mostly occurred in the evening; the maximal frequencies are at about 19 o’clock.
On the weekend (days 6 and 7), the stops are more spread over a day; the high-
est frequencies are on Saturday at 12 and 13 o’clock. The times of the stops in
the blue cluster, represented in the lower histogram, are quite different: the stops
occurred only on the working days and mostly in the morning; the maximal fre-
quencies are attained at about 9 and 10 o’clock. From these statistics, we can quite
confidently conclude that the red cluster represents the home place of the person
and the blue cluster, the work place.
In a similar way, we retrieve the stops for at least 30 min. Naturally, they
include the stops for 3 h or longer, which we have considered before. The map
14 1 Introduction

Fig. 1.13  Positions of stops for 30 min or longer in a map (left) and space–time cube (right)

and space–time cube in Fig. 1.13 represent the stops clustered spatially by means
of the same method. Besides the home and work clusters, which are shown in red
and blue, there are two other large clusters, green with 51 stops and purple with
46 stops. The two-dimensional histograms in Fig. 1.14 show the distribution of
the stops in these clusters by the days of the week and hours of the day. We see
that the stops in the green cluster occurred most often in the middle of the day on
Saturday and in the evenings of the working days. The stops in the purple clus-
ter occurred most often in the middle of the day on Saturday. For other days and
times, the stops were occasional: the filled areas represent one or two stops, except
for the square at 18 o’clock on Thursday (day 4), which represents four stops.
The times of the stops in the green and purple clusters suggest that these may
be the places of the person’s shopping. To check this hypothesis, we zoom in on
the places of these clusters in the map (Fig. 1.15) and find out that, indeed, the
clusters are located in shopping areas.
For the yellow cluster, consisting of 11 stops, which are quite irregular in time,
the map in Fig. 1.13 indicates that it is located in a forest. This may mean that
the stops are related to recreational activities of the car owner. All but one stop
occurred in the months from May to September in the morning hours (from 8 to
11 o’clock). The remaining stop occurred in December at noon time. On a satellite
image from Google Maps, we recognize a tennis ground near the location of the
cluster. Perhaps, the person sometimes plays tennis (in warm months of the year)
or goes for a walk in the forest.
1.2  Multiple Trajectories of a Single Object 15

Fig. 1.14  The temporal distribution of the stops from the green (top) and purple (bottom) clus-
ters by the days of the week and hours of the day

Fig. 1.15  The green and purple clusters of stops are located in shopping areas

Hence, by analysing the car movement data, we have discovered and inter-
preted the significant places of the car owner. Now, we are interested in the routes
of the movement. However, there is a problem: the dataset that we have is a single
movement track. For our analysis, we need it to be divided into trajectories repre-
senting different trips. One possible solution to this problem is to divide the track
16 1 Introduction

by stops: an occurrence of a long stop is treated as the end of the previous trip, and
the resumption of the movement after the stop is treated as the beginning of the
next trip. We need to select a suitable minimum duration of a stop. Selecting dif-
ferent values will divide the track differently. Thus, the single trajectory we have
considered in Sect. 1.1 would be divided into two pieces if we choose the mini-
mum stop duration of 3 h and into three pieces if we choose 30 min.
In this example, we choose the minimum stop duration of 3 h. Hence, if the car
owner made a stop for shopping on the way from work to home, we consider this
as one trajectory rather than as two trajectories. Using this approach, we obtain 365
trajectories. To find repeatedly used routes, we use the same approach as we did for
finding the places of frequent stops: we apply clustering. However, this time we apply
clustering to the trajectories rather than to stop events. The clustering groups together
trajectories following similar routes. We obtain nine groups of similar trajectories var-
ying in size from 4 to 105 and a set of 121 trajectories that do not belong to clusters
(this means that their routes are not similar enough to the routes of other trajectories).
In terms of clustering, objects that are not assigned to any cluster are called “noise”.
In Fig. 1.16, the clusters of trajectories are shown on a map; the “noise” is hidden
by unselecting the respective checkbox in the legend on the right of the map. The clus-
ters are represented by different colours of the trajectory lines. Since overlapping of the
lines makes the clusters hard to distinguish, we have to look at each cluster separately.

Fig. 1.16  Clusters of car trajectories by route similarity are represented on a map by differently


coloured lines
1.2  Multiple Trajectories of a Single Object 17

In Fig. 1.17, each cluster is shown separately in a summarized form of flow


map, similar to Figs. 1.9 and 1.10. Knowing the person’s significant places, we can
easily interpret the routes. Cluster 2 (green) consists of trips from home to work.
The route represented by this cluster was followed 105 times. Clusters 1 (red),
3 (blue), and 5 (purple) are trips from work to home following three different routes.

Fig. 1.17  Clusters of car trajectories by route similarity are represented separately in a summa-


rized form (as flow maps)
18 1 Introduction

The first route is opposite to that of cluster 2, and the latter two routes pass the
two shopping areas we have discovered before. The first route was followed much
more often than the routes through the shops. Cluster 7 (brown) includes five trips
from home to work through one of the shopping areas. Clusters 4 (yellow) and
8 (violet) consist of trips from home to these shopping areas and back, and cluster
6 (orange) consists of round trips passing both shopping areas. The trajectories of
cluster 9 are similar to those of cluster 2. The difference is that they visit the place
in the forest near the work where the tennis ground is located.
Now, we would like to examine and compare the temporal characteristics of the
clusters of trajectories using a space–time cube. However, Fig. 1.8 demonstrates
us that a space–time cube representing a long time interval may be not very effec-
tive. To improve the view and at the same time gain additional information about

Fig. 1.18  Clusters of car trajectories by route similarity are represented in a space–time cube.


The time references in the trajectories have been transformed to times of the same day. Hence,
the trajectories are vertically positioned in the cube according to the times of the day when they
occurred
1.2  Multiple Trajectories of a Single Object 19

temporal characteristics of the trajectories, we can transform the temporal references


in the trajectories. One possibility is to shift the trajectories in time to a single
day. This means that the dates in the temporal references are replaced by one and
the same date, while the times of the day are preserved. The result can be seen
in Fig. 1.18. The trajectories are positioned in the cube according to the times of
the day in which they took place. We remind the reader that the temporal axis of
the display is oriented upwards. As one could expect, the trips from the home to
the work (green, brown, and light-blue clusters) occurred mostly in the morning
and the trips from the work to the home (red, blue, and purple clusters), mostly
in the evening. The trips from the home to the shopping areas (yellow, violet, and
orange) occurred mostly in the middle of the day.
Another useful transformation of the time references is demonstrated in
Fig. 1.19. The original dates have been transformed to relative position in a weekly
cycle starting from Monday and ending with Sunday. Hence, the trajectories are

Fig. 1.19  Clusters of car trajectories by route similarity are represented in a space–time cube.


The time references in the trajectories have been transformed to relative positions in a weekly
cycle from Monday to Sunday. Hence, the trajectories are vertically positioned according to the
days of the week when they took place
20 1 Introduction

vertically positioned in the cube according to the days of the week, with Monday
at the bottom and Sunday at the top. It can be seen that the trajectories linking
home with work occurred on the working days from Monday to Friday and the tra-
jectories from home to shopping areas occurred on the weekend.
In Fig. 1.20, we have transformed the time references to relative times with
respect to the trajectories, that is, the starting times of the trajectories have been
set to one and the same time moment, and the remaining time references have
been adjusted so that the lengths of the time intervals between them are preserved.
The trajectories appear in the space–time cube as if they start simultaneously. This
transformation allows us to compare the durations of the trajectories, including
the stops, and the durations of the stops. Now, we can clearly see that the red and
green clusters consist mainly of fast direct trips from home to work and vice versa
without intermediate stops (there are only a couple of trajectories in the green
cluster that slightly deviate from the main route and have stops). The blue, purple,
and brown routes between home and work were usually used for visiting shops

Fig. 1.20  Clusters of car trajectories by route similarity are represented in a space–time cube.


The time references in the trajectories have been transformed to relative positions with respect
to the starting moments of the trajectories. Hence, the trajectories appear as if they started
simultaneously
1.2  Multiple Trajectories of a Single Object 21

since the trajectories have stops in the shopping areas. Quite naturally, the round
trips from home to shopping areas (yellow, violet, and orange clusters) also have
stops in the shopping areas. The trajectories of the orange cluster, which visit both
shopping areas, have longer durations than the trajectories that visit only one of
the areas. The cube also clearly shows that the trajectories of the light-blue clus-
ter had quite long stops in the forest. This supports our hypothesis that the person
might do sports there.
The example dataset we have analysed is just a sequence of time-referenced
positions of a car. However, we have managed to learn a lot about the person who
drove the car. We now know her home and work place, the places where she usu-
ally shops, the times when she does this, and how long it takes. We know the usual
routes of the person and the reasons for choosing among them. We know the usual
times of driving to work and back home. This knowledge has been obtained by
combining computational processing of the data with interactive visual interfaces,
which allows us to relate the data to the spatial and temporal contexts and involve
our previous knowledge and common-sense reasoning.

1.3 Simultaneous Movements of Many Objects

So far, we have considered movements of a single object (car). Let us take another
example dataset with positions of many cars. The data consist of GPS tracks of
17,241 cars in Milan (Italy) collected during one week from Sunday to Saturday.
The dataset consists of more than 2 million records each including a car identifier,
timestamp (date and time of the day), geographical coordinates, and movement
speed. Dividing the movement tracks of the cars by the minimum stop duration of
30 min, as described in Sect. 1.2, produces about 176,000 trajectories. This dataset
is much bigger than the one we considered previously. The whole dataset is too
large for the kind of analysis we did before. We cannot consider all car trajectories
individually. The visual displays turn out to be ineffective even for small subsets
of the trajectories. Thus, Fig. 1.21 shows less than 10 % of the car trajectories. The
tools for display interaction, for example, zooming on a map or manipulation of
the view in a space–time cube, work with significant delays impeding the analysis.
The available clustering tools cannot be straightforwardly applied to this amount
of data because clustering works in main memory of the computer, whereas the
data do not fit there. Therefore, we cannot group all car trajectories by similarity in
order to consider and compare the groups as we did in the previous example.
To analyse large datasets, it is necessary to use special analysis techniques rely-
ing on database processing. As we have already mentioned, one possible approach
to dealing with large amounts of data is aggregation. We have applied spatial
aggregation to the movements of the single personal car (see Figs. 1.9 and 1.10).
In this case, there are multiple cars that moved simultaneously. One can expect,
however, that their collective movements could be different in different times.
In order to investigate the differences, we apply spatio-temporal aggregation.
22 1 Introduction

Fig. 1.21  About 14,100 trajectories of cars in Milan made on Monday, 2 April 2007. Left the
trajectories are drawn in a fully opaque mode. Right the trajectories are drawn with 5 % opacity

We divide the space (i.e. the territory of Milan) into compartments and the time
span of the data into intervals. For this example, we have chosen hourly intervals;
hence, the whole time span of one week has been divided into 168 hourly inter-
vals. Then, we use database operations to compute statistics for the compartments
and intervals. Thus, we can ask:
• how many different cars visited each compartment in each interval;
• how many moves (transitions) occurred between two neighbouring compart-
ments in each direction in each interval.
It is also possible to compute other aggregate statistics such as the average (or
the minimum, maximum, median, etc.) speed or average time spent in each com-
partment. Aggregates computed in the database can be loaded in main memory
and visualized. In particular, one can use animated maps: one step of the anima-
tion corresponds to one time interval in the aggregated data. Thus, Fig. 1.22 pre-
sents four screenshots of an animated map showing the counts of different cars
that visited the spatial compartments in different time intervals. The screenshots
correspond to the intervals 03–04, 04–05, 05–06, and 22–23 h on Monday.
The counts are represented by circles with proportional areas. We can observe
how the presence of cars, which reflects the intensity of the city traffic, increases
in the morning hours from 03–04 to 05–06 h. In the evening hours, the intensity
of the traffic decreases. We have included in Fig. 1.22 only one screenshot from
the evening hours. We have selected the interval 22–23 h, in which the counts of
the visits were close to those in the early morning interval 03–04 h. However, the
overall spatial distributions of the cars are different. In the evening, there were
notably more cars in the centre of the city than in the early morning.
1.3  Simultaneous Movements of Many Objects 23

Fig. 1.22  Movement of multiple cars in Milan: counts of the presence of cars by spatial com-
partments and hourly intervals

Figure  1.23 presents screenshots from an animated flow map showing aggre-
gated moves between neighbouring compartments. The screenshots have been
taken for the same time intervals as those in Fig. 1.22. As with the previous map,
we can observe a substantial increase in the movement intensity from the interval
03–04 to 05–06 h. The intensity increases first on the belt roads surrounding the
city and later in the centre. In the evening, the intensity of the movements in the
24 1 Introduction

Fig. 1.23  Movement of multiple cars in Milan: counts of the moves of cars between spatial
compartments by hourly intervals

centre is higher than in the early morning. Similar observations can be made for
other working days.
The dynamics of the movement on the weekend is different. We shall not
include more screenshots of animated maps showing the movements on Sunday
1.3  Simultaneous Movements of Many Objects 25

and Saturday since they take considerable page space. There is another method
to visualize spatio-temporal aggregates: to draw in each spatial compartment a
diagram representing the temporal variation in the aggregate values in this com-
partment. In our case, we have 168 hourly intervals; hence, each diagram should
represent 168 different values. We use diagrams in which the values are repre-
sented by colouring of small rectangles (pixels); see Fig. 1.24. We use a diverg-
ing colour scale blue–yellow–red, where shades of blue are used for low values
and shades of red for high values. The pixels are arranged in 24 columns cor-
responding to the 24 hourly intervals of a day and seven rows corresponding to
7 days from Sunday to Saturday. The row for Sunday is on the top and the row for
Saturday on the bottom of the diagrams. The columns are arranged from left to
right; the leftmost column represents the interval 00–01 h and the rightmost col-
umn, the interval 23–24 h. Hence, each diagram tells us how the car presence var-
ied in the respective place over days of the week and times of the day.
Figure  1.24 includes two map fragments. The upper fragment represents the
eastern part of the northern belt road (A4). The lower fragment is taken from the
city centre. It can be noticed that all diagrams have blue colours at the left and
right edges, which reflects low traffic intensity in the nights. It is also noticeable
that the top and bottom rows of pixels, which correspond to Sunday and Saturday,
differ from the remaining five rows corresponding to the working days. On the
belt road, the morning period of low traffic intensity is longer on the weekend than
on the working days. However, the intensities in the afternoons and evenings of
Sunday and Saturday are close to those on the working days. The dynamics in the

Fig. 1.24  Movement of multiple cars in Milan: two map fragments with diagrams showing the
variation in the presence of cars in the spatial compartments by hourly intervals. The columns of the
diagrams correspond to 24 h of a day, and the rows correspond to 7 days from Sunday to Saturday
26 1 Introduction

centre differs from those on the belt road. The presence of cars remains quite low
during the entire day on Sunday and Saturday, and it is notably lower than on the
working days. In addition, in the mornings of the working days, the presence of
cars starts to increase later than on the belt road.
After we have acquired an overall picture of car traffic in Milan, we would like
to learn how certain places in the city are connected. In particular, we are inter-
ested in how people get from the suburbs to the city centre. We outline the city
centre and the major crossings on the belt roads as shown in Fig. 1.25 and again
use database operations to compute the total numbers of moves among the areas of
interest we have defined. We also compute the numbers of moves by hourly inter-
vals. Figure 1.25 presents the total counts of moves. We see that there were many
more cars that moved on the belt roads without going to the city centre than cars
that moved to and from the centre to the belt roads. More specifically, the highest
number of moves between two crossings on a belt road is 3,245 (from crossing N
to crossing NW3), while the highest number of moves between the centre and one
of the crossings is 1,794 (from crossing NW3 to the centre). The flows between
the centre and crossings NW3 and E are more intensive than between the centre
and the other crossings.
Figure  1.26 presents three selected hourly intervals to provide an idea of the
temporal variation in the aggregated movements among the areas of interest.

Fig. 1.25  Flows among selected areas of interest in Milan, including the city centre and major
crossings on the belt roads around the city
1.3  Simultaneous Movements of Many Objects 27

Fig. 1.26  Flows among selected areas of interest in Milan by hourly intervals

Fig. 1.27  Moves among selected areas of interest in Milan by hourly intervals represented in the
form of origin–destination matrix

Another way to represent information about movements among places is to utilize


an origin–destination matrix, as shown in Fig. 1.27. The rows and columns of the
matrix correspond to the places of interest, in our case, the city centre and major
crossings on the belt roads. A cell shows the amount of movement from the place
corresponding to the row to the cell corresponding to the column. The numbers
can be visually encoded, in our case, by filled squares with areas proportional to
the values. The dark-grey bars in the leftmost column (containing the place labels)
represent the total amounts of movement from the respective places. The dark-
grey bars in the column headers represent the total amounts of movement to the
respective places. The three screenshots of the matrix display correspond to the
same time intervals as represented by the maps in Fig. 1.26.
The map and the matrices tell us that in the morning, there is more movement to the
centre than from the centre, except for the link centre—E (east). In the interval 05–06 h,
28 1 Introduction

there are more movements from the centre to the east than in the opposite direction.
Perhaps, many cars go to the airport Linate, which is located on the east. In the after-
noon, the flows from the centre increase, especially the flow to the crossing NW3.
Hence, by aggregating the data and exploring the aggregates with the help of
interactive visual displays, we could learn a lot about the car traffic in Milan. We
have learned how the spatial distribution of the cars and the intensity of move-
ments vary over time. We have investigated the variations in the presence of car
in different places by hours of the day and days of the week and discovered dif-
ferences between the centre and the belt roads. We have studied connections and
flows between selected areas of interest. Although we do not know the territory
of Milan, maps have provided us with the spatial context and allowed us to use
our general knowledge of geographical space, which includes such concepts as
city centre, belt roads, and crossings. We have also used our general knowledge
of time, in particular temporal cycles (daily and weekly), and differences between
day and night, working days and weekends, and so on.

1.4 What Should Have Been Achieved by These Examples

The examples allowed us to introduce informally the major concepts we shall be


dealing with throughout the book:
• position records and movement tracks;
• trajectories;
• dynamic (time-dependent) attributes of movement, such as speed;
• properties of trajectories: start and end positions in time and space, route, stops
on the way, and speed variation;
• spatial events, such as stops;
• flows (summarized movements) between places;
• spatial situations: spatial distribution of multiple moving objects at different
times and aggregate characteristics of their movement, such as intensity of flows
among places;
• local dynamics (temporal variations) of presence and movements in places;
• spatial context of the movement, which was conveyed by the maps;
• temporal context of the movement, in particular, daily and weekly cycles.
We have also demonstrated a number of transformations of movement data:
• division of movement tracks into trajectories representing different trips;
• extraction of events, such as stops;
• spatial and spatio-temporal aggregation;
• transformations of time references.
We have touched upon the use of clustering in analysis of movement-related
data. Clustering of events allowed us to find significant places, and clustering of
trajectories uncovered habitual routes.
1.4  What Should Have Been Achieved by These Examples 29

In our example analyses, we have used a variety of interactive visualization


techniques. The most common techniques for visualizing trajectories and events
are the map and space–time cube. These can be complemented by time graphs and
other temporal displays, which are more effective in representing time. Diverse
displays can be dynamically linked, which means that interactive operations per-
formed by the user on one of the displays are somehow reflected in the others. For
example, in Fig. 1.7, the map display marks the spatial position corresponding to
the temporal position of the mouse cursor within a time graph. We have shown
which techniques can be used to visualize aggregated movement data; in particu-
lar, we have introduced flow maps and origin–destination matrices showing sum-
marized movements among places.
Besides introducing major concepts and demonstrating some of the analytical
techniques used for movement data, the role of the examples was to show how the
capabilities of the computer and human can be combined for extracting knowledge
from data. Movement data are usually semantically poor as they basically consist
of coordinates and timestamps. This was the case in our examples. However, by
analysing the datasets, we have learned much about the life and habits of the car
owner in the first example and about the city traffic in Milan in the second exam-
ple. The computer helped us to generate data abstractions, to find similar occur-
rences and repeated patterns, to extract what we deemed potentially interesting,
and to transform the data for considering them from multiple perspectives. The
computer also did an extremely important thing: it represented the data and their
derivatives on visual displays and allowed us to interact with the displays. This
enabled us to use our human-specific capabilities to perceive patterns and grasp
their meaning, to establish associations (link data and patterns with the con-
text, link different perspectives to an integral mental picture, link new informa-
tion to previous knowledge, etc.), to generate hypotheses, to reason, and to make
conclusions.
Such human–computer analytical processes in which computers not only
process data but also enable humans to involve their unique capabilities to per-
ceive, associate, hypothesize, reason, and comprehend are a major topic of visual
analytics.

1.5 Visual Analytics

Visual analytics is a relatively new term; it has been in use only since 2005
when the book “Illuminating the Path” was published (Thomas and Cook 2005).
However, the kinds of ideas, research, and approaches that are now termed
visual analytics emerged much earlier. The main idea of visual analytics is to
develop knowledge, methods, technologies, and practice that exploit and com-
bine the strengths of human and electronic data processing (Keim et al. 2008,
2010). Visualization is the means through which humans and computers coop-
erate using their distinct capabilities for the most effective results. This idea
30 1 Introduction

has penetrated many research efforts in the areas of information visualization,


GIScience, geovisualization, and data mining long before 2005 (Andrienko et al.
2010).
Since 2005, an attempt has been made to establish visual analytics as a specific
scientific discipline in order to consolidate the relevant research that has been con-
ducted within different disciplines and to stimulate its further progress. The dis-
tinctive features of visual analytics research are as follows:
• emphasis on data analysis, problem solving, and/or decision making;
• leveraging computational processing by applying automated techniques for data
processing, knowledge discovery algorithms, etc.;
• active involvement of a human in the analytical process through interactive vis-
ual interfaces;
• support of the information provenance, that is, how each piece of information
and knowledge has been obtained;
• support for the communication of analytical results to relevant recipients.
As a science, visual analytics develops its theoretical foundations. Since vis-
ual analytics is largely about transforming data to information and knowledge, the
theoretical part of visual analytics describes the possible types of data, as well as
the types of things or phenomena that can be represented by the data, and deter-
mines the types of information and knowledge that can be extracted from the data.
The theory of visual analytics also grounds the possible approaches to extracting
knowledge and information from the data. In these approaches, it defines the dis-
tribution of the workload between the computer and the human analyst so as to
relieve the human from routine operations but utilize the human capabilities of
abstractive perception and creative analytical thinking.
Space and time are considered as key topics in visual analytics research (Keim
et al. 2010; Andrienko et al. 2010). Data with spatial and temporal components
(including movement data) are inherently complex as a result of the complexities
of space and time, in particular, their heterogeneity, the abundance and diversity of
objects populating them, events and processes occurring in them, and the variety
and multitude of spatial, temporal, and spatio-temporal properties and relations.
Spatial and temporal data need to be analysed with a proper consideration of the
spatial and temporal context, which includes all these complexities. It is hardly
possible to formalize all aspects of the context and feed them to computers for
fully automatic processing. Therefore, exploration and analysis of spatial and
temporal data rely on the human analyst’s tacit knowledge of space and time and
space-/time-related experiences. These are incorporated in the analysis through the
use of appropriate visual representations and interactive facilities.
The specifics and complexities of space and time and the directions for the vis-
ual analytics research related to space and time are considered in the dedicated
chapter of the book by Keim et al. (2010) and in the paper by Andrienko et al.
(2010). In our book, we shall consider the specifics and complexities of movement
data and visual analytics approaches to analysing the data and extracting various
kinds of knowledge.
1.6  Structure of The Book 31

1.6 Structure of The Book

Chapter 2 presents the conceptual framework for the analysis of movement. It


describes the types of information contained in movement data and defines the pos-
sible types of tasks in analysing movement. To enable extraction of various types of
information, movement data may need to be converted to different forms. Chapter 3
deals with the possible transformations, which can adapt available movement data to
the analysis goals or to specific requirements of the methods that the analyst wants
to apply, extract relevant parts of the data, or reduce irrelevant details.
Chapter 4 describes basic visualization and interaction techniques that enable
viewing and exploration of movement data and other types of spatio-temporal data
and facilitate data transformations and joint analysis of different data types. These
techniques provide general infrastructure for applying specific visual analytics
methods and procedures and for method combination.
Chapters 5, 6, 7, 8 are dedicated to the analytical methods and procedures that
can be used for analysing movement data. Besides the state-of-the-art methods
that have been previously published by the book authors and other researchers,
there are a number of new methods that have not been published before. The
methods are presented in a systematic way, being grouped according to the possi-
ble foci in movement analysis: movers (Chap. 5), spatial events (Chap. 6), places
(Chap. 7), and times (Chap. 8). Most of the methods combine visual and compu-
tational techniques. The latter are typically not our original inventions but state-
of-the-art techniques from statistics, data mining, and database processing. We
have integrated them with interactive visual interfaces to support synergistic work
of the computer and human. The work of each method is explained by richly illus-
trated examples, for which we have used a number of interesting and challenging
datasets. The datasets are introduced in Chap. 2.
We conclude in Chap. 9 by showing the connections between the pieces pre-
sented in the previous chapters and presenting a general methodological frame-
work for analysing movement behaviours in all their aspects.

References

Andrienko, G., Andrienko, N., Demšar, U., Dransch, D., Dykes, J., Fabrikant, S., et al. (2010).
Space, time, and visual analytics. International Journal Geographical Information Science,
24(10), 1577–1600.
Keim, D., Andrienko, G., Fekete, J-D., Görg, C., Kohlhammer, J., & Melancon, G. (2008). Visual
analytics: Definition, process, and challenges. In A. Kerren, J.T. Stasko, J-D. Fekete, C. North
(Eds.), Information visualization: Human-centered issues and perspectives (Vol. 4950, pp.
154–175) of LNCS state-of-the-art survey, Berlin: Springer.
Keim, D., Kohlhammer, J., Ellis, G., & Mansmann, F. (Eds.). (2010). Mastering the informa-
tion age. Solving problems with visual analytics. Eurographics Association, Goslar, Germany.
Electronic version: http://diglib.eg.org.
Thomas, J. J., & Cook, K. A. (2005). Illuminating the path: The Research and development
Agenda for Visual Analytics. New York: IEEE Computer Society Press.
Chapter 2
Conceptual Framework

Abstract  We introduce a conceptual framework intended to describe in a systematic


and comprehensive way the types of information contained in movement data and the
respective types of analytical tasks. The framework is based on the consideration of
three fundamental sets: space, time, and objects. In the set of objects, we separately
consider two types of spatio-temporal objects playing the most important role in the
phenomenon of movement, moving objects (shortly, movers), and spatial events.
Elements of the fundamental sets can be characterized in terms of the elements of
the other sets. Based on these characteristics, we introduce multi-perspective view
of movement, including mover perspective, space perspective, time perspective, and
spatial event perspective. We suggest a typology of movement analysis tasks, where
classes of tasks are defined according to the possible foci, which correspond to the
different perspective of movement. Tasks are also distinguished according to the level
of analysis, which may be elementary (addressing specific elements of the sets) or
synoptic (addressing the sets or their subsets).

2.1 Foundations

Our conceptual framework aims at describing the possible types of information


that can be extracted from movement data and defining the respective types of
analytical tasks. It is based on the previous research on defining possible types of
analysis tasks, or questions, according to the structure of data, particularly spatio-
temporal data.
Peuquet (1994, 2002) distinguishes three components in spatio-temporal data:
space (where), time (when), and objects (what). Accordingly, Peuquet defines
three basic kinds of questions:
• when + where → what: Describe the objects or set of objects that are present at
a given location or set of locations at a given time or set of times.
• when + what → where: Describe the location or set of locations occupied by a
given object or set of objects at a given time or set of times.

G. Andrienko et al., Visual Analytics of Movement, 33


DOI: 10.1007/978-3-642-37583-5_2, © Springer-Verlag Berlin Heidelberg 2013
34 2  Conceptual Framework

• where + what → when: Describe the times or set of times that a given object or


set of objects occupied a given location or set of locations.

Blok (2000) and Andrienko et al. (2003) define analysis tasks for spatio-temporal
data based on the types of changes occurring over time:
• Existential changes, that is, appearance and disappearance.
• Changes of spatial properties: location, shape, size, and/or orientation.
• Changes of thematic properties expressed through values of attributes: quali-
tative changes and changes of ordinal or numeric characteristics (increase and
decrease).
Bertin (1983) introduces the notions of “question types” and “reading levels”.
The notion of question types refers to components (variables) present in data:
“There are as many types of questions as components in the information” (Bertin
1983, p. 10). For each question type, there are three reading levels, elementary,
intermediate, and overall. The reading level indicates whether a question refers
to a single data element, to a group of elements, or to the whole phenomenon
characterized by all elements together. Andrienko et al. (2003) argue that there is
no fundamental difference between the intermediate and overall levels and sug-
gest joining these into a single notion. In accord with this idea, Andrienko and
Andrienko (2006) distinguish elementary and synoptic analysis tasks.
Hägerstrand (1970) introduced time geography, a conceptual framework for
analysing movements and activities of human individuals (see also Kraak 2003;
Miller 2005). Each individual follows a trajectory through space and time, called
space–time path. These paths are influenced by constraints: capability constraints
(for instance mode of transport and need for sleep), coupling constraints (for
instance being at work or at the sports club), and authority constraints (for instance
accessibility of buildings or parks in space and time). A prominent feature of time
geography is the view of space and time as inseparable. Hägerstrand’s basic idea
was to consider space–time paths in a three-dimensional space where horizontal
axes represent geographic space and the vertical axis represents time. This repre-
sentation is known as “space–time cube” (Kraak 2003); examples can be seen in the
introductory chapter. Another important concept of time geography is the notion of
“space–time prism”, which means the volume in space and time a person can reach
in a particular time interval starting and returning to the same location (for instance,
where a person can get from his workplace during a lunch break). Miller (2005) sug-
gests a measurement theory for time geography, which includes formal definitions
of the main concepts and fundamental relations between them. This provides foun-
dations for building computational tools for time geographic querying and analysis.
Spaccapietra et al. (2008) propose a conceptual model of movement in which
trajectories are represented as sequences of stops (i.e. stays at particular places)
and moves. Stops are important parts of trajectories associated with domain-­
specific semantics while moves are transitions between consecutive stops.
Orellana and Renso (2010) represent movement as a collection of interactions of
the moving objects with the environment in which the movement takes place.
2.1 Foundations 35

Our framework for analysis of movement data builds on these approaches and
elaborates the concepts presented in papers by Andrienko et al. (2008, 2011a, b).

2.2 Fundamental Sets: Space, Time, and Objects

Consistently with the ideas of Peuquet (1994, 2002), we consider three fundamen-
tal sets pertinent to movement: space S (set of locations), time T (set of instants or
intervals), and set of objects O. Elements of each set have their properties, which
can be represented by values of attributes. Among others, there may be attributes
whose values are elements of T, S, or O, or more complex constructs involving
elements of T, S, or O. Attributes that do not involve time or space will be called
“thematic”, according to the terminology adopted in the geographical literature.
For example, there may be thematic attributes with numeric or nominal values.

2.2.1 Space

Space is a set consisting of locations, or places. An important property of space


is the existence of distances between its elements. At the same time, space has no
natural origin and no natural ordering between the elements. Therefore, for dis-
tinguishing positions in space, one needs to introduce in it some reference sys-
tem, for example, a system of coordinates. While this may be done, in principle,
quite arbitrarily, there are some established reference systems such as geographi-
cal coordinates.
Depending on the practical needs, one can treat space as two-dimensional (i.e.
each position is defined by a pair of coordinates) or as three-dimensional (each
position is defined by a triple of coordinates). In specific cases, space can be
viewed as one-dimensional. For example, when movement along a standard route
is analysed, one can define positions as the distances from the beginning of the
route, that is, a single coordinate is sufficient.
Theoretically, one can also deal with spaces having more than three dimen-
sions. Such spaces are abstract rather than physical; however, movements of enti-
ties in abstract spaces may also be subject to analysis. Thus, Laube et al. (2005)
explore the “movement” (evolution) of the districts of Switzerland in the abstract
space of politics and ideology involving three dimensions: left versus right, liberal
versus conservative, and ecological versus technocratic.
The physical space is continuous, which means that it consists of an infinite
number of locations and, moreover, for any two different locations, there are
locations “in between”, that is, at smaller distances to each of the two locations
than the distance between the two locations. However, it may also be useful to
treat space as a discrete or even finite set of locations. For example, in studying
the movement of tourists over a country or a city, one can “reduce” space to the
36 2  Conceptual Framework

set of points of interest visited by the tourists. Space discretization may be even
­indispensable, in particular, when positions of entities cannot be measured pre-
cisely and are specified in terms of areas such as cells of a mobile phone network,
city districts, or countries.
The above-cited examples show that space may be structured, in particu-
lar, divided into areas. The division may be hierarchical; for instance, a country
is divided into provinces, the provinces into municipalities, and the municipali-
ties into districts. Areas can also be derived from a geometric decomposition (e.g.
1 km2 cells), with no semantics associated to the decomposition. A street (road)
network is another common way of structuring physical space.
Like coordinate systems, space structuring also provides a reference system
that may be used for distinguishing positions, for instance, by referring to streets
or road segments and relative positions on them, which may be specified in the
form of house numbers or distances from the ends of the segments. The possible
ways of specifying positions in space can be summarized as follows:
• coordinate-based referencing: positions are specified as tuples of numbers rep-
resenting linear or angular distances to certain chosen axes or angles;
• division-based referencing: referring to compartments of an accepted geometric
or semantic-based division of the space, possibly, hierarchical;
• linear referencing: referring to relative positions along linear objects such as
streets, roads, rivers, pipelines, for example, street names plus house numbers or
road codes plus distances from one of the ends.
Since positions of entities often cannot be determined accurately, they may be
represented in data with uncertainty, for example, as areas instead of points.
In our conceptual model, locations (elements of S) may have any geometries:
points, lines, areas, or volumes in three-dimensional space.

2.2.2 Time

Mathematically, time is a continuous set with a linear ordering and distances


between the elements, where the elements are moments, or positions in time.
Analogously to positions in space, some reference system is needed for the
specification of moments in data. In most cases, temporal referencing is done
on the basis of the standard Gregorian calendar and the standard division of a
day into hours, hours into minutes, and so on. The time of the day may be speci-
fied according to the time zone of the place where the data are collected or as
Greenwich Mean Time (GMT). There are cases, however, when data refer to
relative time moments, for example, the time elapsed from the beginning of a
process or observation, or abstract time stamps specified as numbers 1, 2, and so
on. Unlike the physical time, abstract times are not necessarily continuous. The
physical time may be discretized, that is, considered as a set of non-overlapping
intervals.
2.2  Fundamental Sets: Space, Time, and Objects 37

The physical time is not only a linear sequence of moments but includes inherent
cycles resulting from the earth’s daily rotation and annual revolution. These natu-
ral cycles are reflected in the standard method of time referencing: the dates are
repeated in each year and the times in each day. Besides these natural cycles, there
are also cycles related to people’s activities, for example, the weekly cycle. Various
domain- and problem-specific cycles exist as well, for example, the revolution peri-
ods of the planets in astronomy or the cycles of the movement of buses or local
trains on standard routes. Temporal cycles may be nested; in particular, the daily
cycle is nested within the annual cycle. Hence, time can be viewed as a hierarchy of
nested cycles. Several alternative hierarchies may exist, for example, year/month/
day-in-month and year/week-in-year/day-in-week.
A comprehensive discussion of the phenomenon of time, its properties, struc-
ture, and ways of looking at time and modelling time can be found in the book by
Aigner et al. (2011).
In our conceptual model, time T is a continuous or discrete linearly ordered set
consisting of time instants or time intervals, jointly called time units. The temporal
cycles are expressed as attributes of the time units ; for each temporal cycle, there
is an attribute whose values are the positions of the time units within the cycle.
Examples of such attributes are “month”, “time of the day”, “day of the week”,
“day of the year”, “week of the year”, “quarter of the year”, etc.

2.2.3 Objects

The set of objects includes various physical and abstract entities. Objects can be
classified according to their spatial and temporal properties. A spatial object is an
object having a particular position in space in any time moment of its existence.
A  temporal object, also called event, is an object with limited time of existence
with respect to the time period under observation, or, in other words, an object
having a particular position in time. Spatial events are objects having particu-
lar positions in space and time. A moving object, also called mover, is a kind of
spatial object capable to change its spatial position over time. Moving events are
events that change their spatial positions over time. Spatial events and movers can
be jointly called spatio-temporal objects. Table 2.1 contains the definitions of the
types of objects and examples. The Venn diagram in Fig. 2.1 illustrates graphically
the is–a relations between the types of objects.
Moving objects and spatial events are the types of objects playing the most
important role in our conceptual framework and in the whole book. We shall use
special notations M and E for denoting sets of movers and spatial events, respec-
tively. The notation O will be mostly used for denoting objects in general.
Movement is the change of the spatial position(s) of one or more objects (mov-
ers) over time. Changes of the spatial position of one mover can be represented by
a mapping (function) τ: T → S. For a chosen time interval [t1, t2], where t1 < t2,
the function τ: T  →  S defines a sequence of spatial positions, which is called
38 2  Conceptual Framework

Table 2.1  Types of objects according to their spatial and temporal properties


Concept Upper concepts Properties Examples
Spatial object Object Has a certain position Building, home, road, city
in space (a location centre, car, pedestrian,
or a set of locations, iceberg, deer, lynx, rainfall,
not necessarily trajectory, stop, turn, a
continuous) lynx chasing a deer, a car
exceeding speed limit
Event (temporal Object Appears and/or Rainfall, trajectory, stop, turn,
object) disappears during the a lynx chasing a deer, a
time period under car exceeding speed limit,
analysis, that is, has sunset, night, winter
a certain position in
time (a time unit or
a sequence of time
units)
Spatial event Spatial object, Has certain positions in Rainfall, trajectory, stop, turn,
(spatio-temporal event space and in time a lynx chasing a deer, a car
object) exceeding speed limit
Static spatial Spatial object The spatial position Building, home, road, city
object is constant; exists centre
during the whole time
period under analysis
Mover (moving Spatial object The spatial position Car, pedestrian, iceberg, deer,
object) changes over time lynx, a lynx chasing a deer,
a car exceeding speed limit
Moving event Mover, event Exists during a sequence A lynx chasing a deer, a car
of time units (that exceeding speed limit
is, not instant); the
spatial position
changes over time

Fig. 2.1  Types of Objects
objects and is–a relations
between them Events Spatial events
Spatial objects

Movers

Moving events

trajectory of the moving object. Movement of multiple objects can be represented


as a function μ: M × T → S.

2.3 Characteristics of Objects, Locations, and Times

Here, we consider how elements of each of the three basic sets can be charac-
terized in terms of the other two sets. Spatial events are characterized by their
spatio-temporal positions, that is, by pairs (t, s), where t ∈ T, s ∈ S. Movers are
2.3  Characteristics of Objects, Locations, and Times 39

characterized by their trajectories, where a trajectory consists of pairs (t, s),


t ∈ T, s ∈ S.
Presence dynamics is a dynamic (time-dependent) attribute characterizing a
location in terms of the objects that are present in it in different time units. This
can be represented as a function T → P(O), where P(O) is the power set of the set
of objects O (i.e. the set of all subsets of O), and O may include movers and spa-
tial events: O = M ∪ E. Spatial situation is an attribute characterizing a time unit
in terms of the spatial positions of the objects existing in this time unit. This can
be represented by a function O → S, which matches each object to a location in
space. An equivalent representation is S → P(O): the locations are matched with
sets of objects appearing in them.
Figure 2.2 schematically represents the characteristics of elements of the three
sets in terms of elements of the other sets. This scheme has a clear relation to the
triad model “what, when, where” by Peuquet (1994).
Objects, locations, and times may also have thematic attributes, that is, attrib-
utes not involving locations or times. Thematic attributes of objects and locations
may be static (i.e. values do not change over time) or dynamic. A dynamic attrib-
ute can be represented by a mapping T → A from the set of time units T to some
domain (set) A containing possible values of the attribute. Multiple dynamic attrib-
utes can be represented by a mapping T → A1 × ⋯ × An. We shall use the nota-
tion T → A as a short form of T → A1 × ⋯ × An, that is, allow A to stand for
A1 × ⋯ × An.
Hence, movers are characterized by trajectories representing the mapping
T  →  S and, possibly, dynamic thematic attributes representing the mapping
T → A. Instead of dealing with the mappings T → S and T → A separately, one
can consider their join T → S × A consisting of triples (t, s, a). It is convenient to
extend the definition of trajectory to include also possible dynamic attributes of
movers. From now on, we assume that trajectories may include dynamic thematic
attributes. The notation T → S × A will represent such a trajectory.
In an analogous way, we extend the definitions of presence dynamics char-

dynamics is defined as the mapping T  →  P(O)  ×  A1  ×  ⋯  ×  An, or, shortly,


acterizing a location and spatial situation characterizing a time unit. Presence

T → P(O) × A. Spatial situation is the mapping S → P(O) × A1 × ⋯ × An, or,


shortly, S → P(O) × A.

Objects

Spatial situation Presence dynamics

Spatio-temporal position (events)


Locations Times
Trajectory (movers)

Fig. 2.2  Characteristics of objects, places, and times


40 2  Conceptual Framework

A number of dynamic thematic attributes of moving objects may be derived from


their trajectories. Thus, Fig. 1.7 portrays the temporal variation of the speed, which
was computed from a trajectory. More generally, a number of instant, interval, and
cumulative attributes characterizing the movement can be derived from trajectories.
Instant movement attributes include instant speed, direction, acceleration (change of
speed), and turn (change of direction). Interval movement attributes are computed
for time intervals of a chosen constant length before, after, or around a given time
moment. They include travelled distance, displacement, average speed, sinuosity, tor-
tuosity, as well as statistical summaries of the instant characteristics, such as mean,
median, minimum, maximum. Cumulative movement attributes are computed for the
interval from the start of the trajectory to a given time moment or for the remaining
interval to the end of the trajectory. Cumulative measures include all interval meas-
ures and the temporal distances to the starts and ends of the trajectories.
Various dynamic thematic attributes can also be derived from presence dynam-
ics in locations: counts of the objects, statistics of their thematic attributes (aver-
age, minimum, maximum, mode, etc.), statistics of the times spent by the objects
in the locations, etc. From spatial situations, it is possible to derive thematic attrib-
utes of time units, such as the total count of existing objects, object density, aver-
age speed, prevailing movement direction, and others.
Additionally to the attributes defined in terms of elements of S, T, and O
(Fig. 2.2) and the derivatives of these attributes, objects, locations, and time units
may have their independent characteristics. Examples of dynamic attributes of
movers that may have a great impact on the movement include:

• transportation means (e.g. movements by car, by bike, by public bus, by train, or


on foot significantly differ from each other);
• purpose of the movement (e.g. a person may go to work, do shopping, walk for
pleasure or for exercise; an animal may search for food, pursue a prey, or try to
escape from a predator);
• activity performed during the movement (e.g. talking by a mobile phone or
looking at a map).

Movement can also be affected by the age of the movers, their occupation, fam-
ily status, and other characteristics that change relatively slowly. If no changes
occurred during a time period under analysis, the respective attributes can be con-
sidered static.
A trajectory of a moving object has a certain position in space, which is the set of
locations visited by the mover. When the mover is regarded as a point (i.e. the shape and
size are ignored), the spatial position of the trajectory is a line in S. Since a trajectory
has a position in space, it is a spatial object. A trajectory has also a specific position in
time: it is the time interval [t1, t2] on which the trajectory is defined. Hence, a trajectory,
generally, is a spatio-temporal object, that is, a spatial event, by the definition given in
Sect. 2.2. As a type of spatio-temporal object, a trajectory has its own attributes, namely:
• route, consisting of the position and the geometric shape of the trajectory in the
space;
2.3  Characteristics of Objects, Locations, and Times 41

• travelled distance, that is, the length of the trajectory in space;


• duration of the trajectory in time;
• movement vector (i.e. from the initial to the final position), or major direction;
• statistics of the speed (mean, median, maximal, etc.);
• dynamics (behaviour) of the speed;
• dynamics (behaviour) of the direction.
Trajectories can be compared according to these attributes. Trajectories that are
similar in terms of selected attributes can be grouped by means of clustering. In
our example analysis in Sect. 1.2, we used clustering of trajectories by similarity
of their routes.

2.4 Basic Types of Spatio-temporal Data

Spatio-temporal data describe changes occurring in space over time. As noted in


Sect. 2.1, there are three types of changes: existential changes (appearance and
disappearance of objects), changes of spatial properties, in particular, spatial posi-
tions of moving objects, and changes of thematic properties of locations and spa-
tial objects. Accordingly, there are three basic types of spatio-temporal data.
Data describing object appearance and disappearance refer to objects with
limited time of existence, that is, events. Therefore, this type of data can also be
called event data. For an event, the data specify the time interval of its existence.
For a spatial event, event data also specify its spatial position. Hence, spatial event
data materialize the function E → S × T, which maps a set of objects of the type
spatial events to locations in space and intervals in time. Spatial event data with
thematic attributes can be represented by the formula E → S × T × A, where A
stands for one or more attributes.
For simplicity, it can be assumed that there are no reviving events, that is, an
event cannot appear again after it has disappeared. If revival of events happens in
some application, it can be handled in two alternative ways. One way is to con-
sider each new appearance of an event as a new event. The other way is to intro-
duce an attribute indicating the state of an event, either active or inactive, and
represent a temporary disappearance of an event as a change from the active state
to inactive and a revival as the opposite change.
Data describing changes of spatial properties may refer to moving objects,
which change their spatial positions, and also to other kinds of spatial objects that
change their shapes, sizes, and/or orientation. In the context of this book, we focus
on movement, that is, changes of spatial positions. Therefore, we deal with move-
ment data describing changes of spatial positions of movers. Movement data mate-
rialize the function μ: M × T → S. Movement data can also be viewed as a set of
trajectories of movers: M → (T → S). The formula M → (T → S × A) represents
42 2  Conceptual Framework

a set of trajectories including dynamic thematic attributes. An equivalent represen-


tation is M × T → S × A.
Data describing changes of thematic properties (values of attributes) may refer
to locations in space or to spatial objects (static objects, events, and movers). For
a given object or location, the data specify the values of its thematic attributes in
different time units. This may be represented by the formula O  →  (T  →  A) or
S  →  (T  →  A), where A may stand for multiple attributes. Temporal sequences
of attribute values materializing the mapping T  →  A are commonly called time
series. Time series describing spatial locations or static spatial objects are called
spatially referenced time series or, simply, spatial time series. We stress that spa-
tial time series refer to fixed locations in space, either directly or by referring to
spatial objects having fixed positions. The formula S  → (T  →  A) covers both
cases.
The relative order of S and T in the formula S  → (T  →  A) is not important
since both S and T are independent components determining the values of the
dependent component A. The representations S  → (T  →  A), T  → (S  →  A) and
S × T → A are equivalent in the sense that they describe the same data structure.
At the same time, they represent different views of spatial time series. The formula
S → (T → A) represents the view of the data as a set of local time series T → A in
different locations of S. The formula T → (S → A) corresponds to the view of the
data as a set of spatial distributions S → A in different time units of T. The term
“spatial distribution” denotes an arrangement of values of one or more attributes
across the space in a given time unit. The formula S × T → A corresponds to a
view when the analyst is interested only in attribute values in specific locations
and times, for example, in locations visited by moving objects at the times of the
visits.
Let us consider presence dynamics T  →  P(O)  ×  A and spatial situations
S → P(O)  × A. Obviously, their components T → A and S → A may be repre-
sented by data in the form of local time series and spatial distributions, respec-
tively. P(O) can be considered as the domain of a specific dynamic attribute
the possible values of which are sets of objects. In data, sets of objects can be
represented by lists of object identifiers. Hence, a mapping T  →  P(O) can be
materialized in a time series of attribute values and a mapping S  →  P(O) in a
spatial distribution of attribute values, where the values are lists of object iden-
tifiers. Therefore, we can say that presence dynamics in their general form
T → P(O) × A can be described by time series associated with locations and spa-
tial situations in their general form S → P(O) × A can be described by spatial dis-
tributions associated with time units.
In some analysis tasks, it may be irrelevant which specific objects are present
in each location in different time units, and only the total number of the objects
or the number of times they appear in a location may be of interest. In such cases,
object presence in a location may be represented in an aggregated form, as the
count of different objects that appeared in the location or the count of the appear-
ances (visits). Spatial time series can represent aggregated presence dynamics and
spatial distributions can represent aggregated spatial situations.
2.4  Basic Types of Spatio-temporal Data 43

Table 2.2  Characteristics of spatial objects, locations, and times and respective data types
Type of entities Characteristics Data types
Spatial events E Spatio-temporal positions Spatial event data E → T × S × A
and thematic attributes
T × S × A
Movers M Trajectories T → S × A Position data (M × T) → (S × A)
Trajectory data M → (T → S × A)
Spatial locations S Presence dynamics Local time series S → (T → A)
T → P(O) × A
Time units T Spatial situations Sequence of spatial distributions
T → P(O) × A T → (S → A)

Table 2.2 summarizes the characteristics of spatial objects, locations, and times,


and the types of data that can represent them.

2.5 Event-Based View of Movement

In Sect. 2.3, we said that a trajectory is a spatial event, that is, an object having
particular positions in space and time. A trajectory consists of pairs (t, s), t  ∈  T,
s ∈ S, or, more generally, tuples (t, s, a), where a may stand for values of one or
more dynamic thematic attributes. Each tuple has a particular position s in space
and a particular position t in time; in our classification, it is a spatial event. Hence,
a trajectory is a complex spatial event consisting of a sequence of elementary spa-
tial events (t, s, a).
A sequence of temporally consecutive events may be regarded as one composite

reasons for uniting consecutive events (t1, s1, a1), (t2, s2, a2), ⋯, (tk, sk, ak) into one
event, which, in turn, may be part of a yet larger composite event. One of the possible

composite event may be constancy or approximate constancy of s (s1 ≈ s2 ≈ ⋯ ≈ sk)


and/or a (a1 ≈ a2 ≈ ⋯ ≈ ak). Thus, in the example analysis in Sect. 1.2, we have
extracted stop events, that is, segments of trajectories in which the spatial position
was constant. Examples for constant values of a are trajectory segments with constant
speed, movement direction, or transportation mode.
Composite events demarcated based on constancy of attribute values corre-
spond to the concept of “movement episode”, or “behavioural episode”, which
is defined in the research literature as a discrete time period for which the spa-
tio-temporal behaviour of a mover was relatively homogeneous or as a trajectory
fragment in which movement characteristics were relatively stable (Mountain
and Raper 2001; Dykes and Mountain 2003; Laube et al. 2007; Wood and Galton
2010). The concept of movement episode is subsumed by our concept of compos-
ite event, in which stability of movement characteristics is not an essential part of
the meaning. There may be also other reasons for joining elementary events, for
example, relations to the movement context, which will be discussed later.
44 2  Conceptual Framework

We shall use the term movement events to refer to elementary and ­composite
spatial events involved in the movement. We shall use the notations (t, s) and
(t,  s,  a) both for elementary and for composite movement events. This means
that t may stand either for an element of T (t  ∈  T) or for a continuous ­subset of
T (t  ⊂  T), that is, a sequence of consecutive time units. In both cases, s is
the set of spatial locations (s  ⊂  S) and a is the set of attribute values (a  ⊂  A,
A = A1 × ⋯ × An) corresponding to t by the mapping T → S × A. The depend-
ence of s and a on t may be emphasized by transforming (t, s, a) to (t, s(t), a(t)). To
denote that a movement event (t, s, a) belongs to a moving object m, the notation
(m, t, s, a) may be used.
This reasoning demonstrates that spatial events are intrinsic in movement.
Movement can be viewed as a composition of spatial events. Figure 2.3 illus-
trates this event-based view of movement graphically. This view extends the con-
ceptual model of movement as a combination of stops and moves suggested by
Spaccapietra et al. (2008). In our model, stops and moves are particular types
of spatial events among other possible types. Wood and Galton (2010) suggest
another conceptual model where movement consists of two types of elementary
events: “chunks” of homogeneous process (i.e. when no qualitative changes occur)
and transitions, when one process starts, stops, or is replaced by another. Events
and processes can be considered at different levels of granularity. For example,

has-value Mover
is-a has-attribute
“Spatial has-attribute
position”

has-attribute “Trajectory” “Speed”, etc.

S (space) has-value has-value


Spatial
object
T→S T→A
is-a is-a join is-a
has-value Spatial
T→S×A Event
event
“Temporal
part -of part-of
position” is-a is-a part-of
is-a
has- (t,s) (t,a)
attribute
has-
Event projection has-
(t,s,a) projection
is-a

Fig. 2.3  Movement as a composition of spatial events


2.5  Event-Based View of Movement 45

walking of a person from A to B may be considered as a single event (“chunk” of


a homogeneous process of walking) at the coarsest granularity level and as a suc-
cession of overlapping leg-movement events at the finest level, with many interme-
diate levels possible. In our model, composite events can be considered as units or
as collections of smaller (elementary or composite) events linked by certain spa-
tial and temporal relations, in particular, spatial and temporal neighbourhood and
­temporal ordering.
Not only trajectories of movers consist of spatial events but also presence
dynamics in locations and spatial situations in time units. An appearance of an
object o in a location s in time unit t is a spatial event with the spatial position s
and temporal position t. This event can be called location visit and represented
by a tuple (o, t, s) or (o, t, s, a), where a stands for values of dynamic thematic
attributes of the object at the time of appearing in the location s. Both presence
dynamics and spatial situations can be viewed as combinations of visits. A pres-
ence dynamics in a given location s can be viewed as a set of all events (o, t, s, a)
with the same s. A spatial situation in a given time unit t can be viewed as a set of
all events (o, t, s, a) with the same t. This is analogous to the view of a trajectory
of a given object o as the set of events (o, t, s, a) with the same o (in this case,
notation m could be used instead of o). Absence of any object in a location s in
time unit t is also a spatial event, which may be represented as (Ø, t, s, a) and
called null visits.
Location visits are elementary events, which can be joined into various com-
posite events. Examples of composite events are visit of a location by a particular
group of objects or by a group of objects of at least a given size. Location visits
and composite events composed of them can be jointly called presence events. A
general form of a presence event is (P(O), t, s, a), where P(O) denotes subsets of
objects, including subsets with one or no members. When only moving objects are
taken into account, the expression can be rewritten as (P(M), t, s, a).
As will be described later in the book, spatial events with various semantics can
be extracted from all types of spatio-temporal data: trajectories, local time series,
spatial distributions, and spatial events (in the latter case, composite events are
created from elementary events). Extracting spatial events followed by analysing
their spatio-temporal distribution and other characteristics is therefore a general
approach that can be used in analysis of various spatio-temporal data.
Hence, spatial events are very important and therefore given much attention in
our conceptual framework and in the book as a whole.

2.6 Multi-Perspective View of Movement

Let us consider again a set of movers M and the movement function μ: M × T → S.
Trajectories of the moving objects M → (T → S), presence dynamics of the mov-
ing objects in locations S  → (T  →  P(M)), and spatial situations in time units
T → (S → P(M)) are, in fact, equivalent transformations of the movement function.
46 2  Conceptual Framework

Movers
Trajectories
M→ (T→S)

Spatial events
Spatio-temporal positions
E→(T×S)

Space (locations) Time (time units)


Presence dynamics Spatial situations
S → (T→ P(M ∪E)) T→(S→ P(M∪ E))

Fig. 2.4  Four perspectives of the phenomenon of movement

They represent three complementary perspectives on movement. We can call them


mover-oriented perspective, space-oriented perspective, and time-oriented perspec-
tive, depending on the set standing on the first place in a formula.
Furthermore, as argued in the previous section, trajectories, presence dynamics,
and spatial situations can be viewed as combinations of spatial events. This gives
one more perspective of movement: event-oriented perspective. This ­perspective
can be represented by the generic notation E  → (T  ×  S), where E stands both
for movement events (m, t, s, a) occurring in trajectories and for presence events
(P(M), t, s, a) occurring in presence dynamics and spatial situations. Both classes
of spatial events involve moving objects, elements of M. The space- and time-
oriented perspectives can be extended by including in them spatial events in
addition to movers. The formal representations are S  → (T  →  P(M  ∪  E)) and
T → (S → P(M ∪ E)), respectively.
The four perspectives of the phenomenon of movement are schematically
­represented by the drawing in Fig. 2.4. Each perspective, besides the main compo-
nents M, E, T, and S, may include also domains of thematic attributes A.
According to these four perspectives, movement may be represented in
­different types of data: trajectory data materializing the mapping M → (T → S),
spatial event data materializing the mapping E → (T × S), local time series mate-
rializing the mapping S → (T → P(M ∪ E)), and spatial distributions materializ-
ing the mapping T → (S → P(M ∪ E)). We shall consider these data types, which
have been discussed in Sect. 2.4 and listed in Table 2.2, as different possible forms
of movement data.
2.7  Spatio-temporal Context 47

2.7 Spatio-temporal Context

In the formal model, space S and time T are abstractions that represent the real,
physical space and time. The physical space and physical time are not uniform
but heterogeneous, which means that their elements have differing properties. In
the geographical space, water differs from land, mountain range from valley, forest
from meadow, seashore from inland, city centre from suburbs, and so on. It can be
said that every location has some degree of uniqueness relative to the other loca-
tions. The same applies to other physical spaces such as inner spaces of b­ uildings
or the space of human body. In physical time, day differs from night, winter from
summer, working days from weekends, etc. Properties of locations and times may
have a great impact on movements and, hence, should be taken into account in
analysis.
Relevant characteristics of spatial locations includethe following:
• altitude, slope, aspect, and other characteristics of the terrain;
• accessibility with regard to various constraints (obstacles, availability of roads,
etc.);
• character and properties of the surface: land or water, concrete or soil, forest or
field, etc.;
• objects present in a location: buildings, trees, monuments, etc.;
• function or way of use, for example, housing, shopping, industry, agriculture, or
transportation;
• activity-based semantics, for example, home, work, shopping, leisure, and so on.
When locations are defined as space compartments (i.e. areas in two-dimen-
sional space or volumes in three-dimensional space) or network elements rather
than points, the relevant characteristics also include the following:
• spatial extent and shape;
• capacity, that is, the number of entities the location can simultaneously contain;
• homogeneity or heterogeneity of the properties (listed above) over the
compartment;
• connections to other locations, especially in a network.
Properties of locations may change over time. For example, a location may
be accessible on weekdays and inaccessible on weekends; a town square may be
used as a marketplace in the morning hours; a road segment may be blocked or its
capacity may be reduced due to an accident or construction works.
Characteristics of time units and, hence, their effects on movements greatly
depend on their positions within the temporal cycles. For example, the movements
of people on weekdays notably differ from the movements on weekends; moreo-
ver, the movements on Fridays differ from those on Mondays and the movements
on Saturdays from those on Sundays. It is very important to anticipate which
temporal cycles may be relevant to the movements under study and to take these
cycles properly into account in the analysis. For this purpose, it is necessary that
48 2  Conceptual Framework

the cycles are reflected in temporal references of the data items. Typically, this is
done through specifying the cycle number and the position from the beginning of
the cycle. In fact, the standard references to dates and times of the day are built
according to this principle. However, besides the standard references to the yearly
and daily cycles, references to other (potentially) relevant cycles, for example, the
weekly cycle of people’s activities or the cycles of the movement of satellites, may
be necessary or useful. Hence, an analyst may need to transform the standard ref-
erences into references in terms of alternative time hierarchies, as we did in our
example analysis in Chap. 1.
There may be not only regular differences between time units caused by their
positions in the temporal cycles but also irregular differences. For example, the
regular weekly cycle may be disrupted by intrusions of public holidays. Not only
the intrusions themselves but also the preceding and/or following times may be
very different from the “normal” time; think, for example, of the days before and
after Christmas. Such irregular changes should also be taken into account in the
analysis of time-dependent phenomena.
The regularity of changes may itself vary, in particular, owing to interactions
between larger and smaller temporal cycles. Thus, the yearly variation of the dura-
tion of daylight has an impact on the properties of times of a day, which, in turn,
influence movements of people and animals. In the results, movements at the same
time of the day in summer and in winter may substantially differ.
The diverse properties of locations and times constitute a part of the context
in which spatial and temporal objects exist and move. Another part of the con-
text, with regard to each object, consists of the other objects that exist or occur in
the same space and/or time, for example, ways and obstacles, weather, events, sur-
rounding movers, and the properties of these objects. All these together make the
spatio-temporal context.
According to Tomaszewski and MacEachren (2010), there are three aspects of
context: spatial (geographical), temporal (historical), and conceptual. Conceptual
context consists of relevant generic concepts and their relations, general principles
and rules, causalities, regularities, etc. Conceptual context is often available in the
form of background knowledge of a human analyst. Visual analytics must enable
analysts to utilize such knowledge.
In our examples in Chap. 1, the maps showed us the spatial context of the
movements. Information about the temporal context, including the temporal cycles
and the properties of different times within the cycles, was not represented explic-
itly, which is a typical case; however, we could use our background knowledge.
The use of the conceptual context can be also seen in our example analyses. In
interpreting the individual movement, we relied on our general knowledge that a
person usually has a home, may go to work, visit shops, and so on. In analysing
the data about the cars in Milan, we used our notions of city centre and suburb,
belt road and crossing. Since we analysed movement of people, we could assume
that they move purposefully rather than chaotically.
It can be noted that the spatial and temporal contexts are more local (i.e. lim-
ited in space and in time to the neighbourhood of object’s position), specific (to
2.7  Spatio-temporal Context 49

particular positions in space and time), and dynamic (i.e. changing over time) than
the conceptual context, which tends to be global, generic, and steady. In this book,
we focus mainly on the spatial and temporal contexts. Since many kinds of objects
and phenomena are both spatial and temporal, we do not separate the spatial and
temporal aspects but join them into a single concept of spatio-temporal context.
For the sake of brevity, we shall sometimes use the single word “context” to refer
to the spatio-temporal context.
Formally, for a selected object o, the spatio-temporal context C consists of
the space, time, and other objects positioned in the space and/or time with their
respective attributes: C(o)  =  S  ∪  T  ∪  O\{o}. Spatio-temporal context exists not
only for objects (movers and spatial events) but also for the members of the
other two fundamental sets S and T. For a given location s ∈ S, the context con-
sists of the other locations, time, and objects: C(s) = S ∪ T ∪ O\{s}. For a given
time unit t  ∈  T, the context consists of the other time units, space, and objects:
C(t) = S ∪ T ∪ O\{t}.
According to the event-based view (Sect. 2.5), movement is a composition of
spatial events, such as movement events and presence events. Generally, any spa-
tial event occurs in a spatio-temporal context, which includes other events. For any
movement event e, e = (m, t, s, a), all other movement events, including the earlier
and later movement events of the same mover m, make a part of the spatio-tempo-
ral context of the event e.
We shall use the term context element to denote an element of the set C. We
consider three types of context elements:
• spatial context elements (SCE): static spatial objects; arbitrary locations; spe-
cific locations in a trajectory such as the start position, end position, middle, and
medoid (the closest position to all other positions);
• temporal context elements (TCE): events; arbitrary time moments; specific time
moments in a trajectory such as start time, end time, half-way time;
• spatio-temporal context elements (STCE): moving objects; arbitrary spatial
events; movement events in a trajectory.

2.8 Relations

2.8.1 Relations of Objects

As said before, spatial and temporal objects exist in a spatio-temporal context,


which includes locations, times, and other spatial and temporal objects. Objects
are linked to elements of the context by various spatial and temporal relations.
Spatial relations link spatial objects through their spatial positions to elements and
subsets of S. Other spatial objects also have positions in S; hence, spatial relations
link spatial objects to other spatial objects. Temporal relations link a temporal
object (event) through its temporal position to elements and subsets of T. Other
50 2  Conceptual Framework

Context

S (space)
Temporal Spatio- Spatial
T (time) temporal
position position
object
temporal relations spatial relations
Trajectory

positioned-in positioned-in
Movers

Events
Spatial events Spatial objects

Fig. 2.5  Relations between a spatio-temporal object (such as event or mover) and elements of


the spatio-temporal context

Object A

Spatial position Trajectory Temporal position

Spatial Spatio-temporal Temporal


relations relations relations

Spatial position Trajectory Temporal position


Object B

Fig. 2.6  Relations between spatio-temporal objects

events also have positions in T; hence, temporal relations link events to other
events. A trajectory of a mover consists of spatial events and is therefore linked
to the context by spatial and temporal relations and their combinations (spatio-­
temporal relations). Figure 2.5 schematically represents the relations between a
spatio-temporal object and elements of its spatio-temporal context. Figure 2.6
focuses on relations between objects.
Since we deal with a dynamic world, in which objects appear, move, and disap-
pear and properties of objects, locations, and times change, most relations exist
only during certain time intervals. Let o be an object and c an element of its spa-
tio-temporal context. Let o be related to c by a relation type R during a time inter-
val [t1, t2], which means that R(o, c) is true during [t1, t2]. A combination (R, o, c,
[t1, t2]) where R(o, c) is true during [t1, t2] is called an instance, or occurrence of
the relation type R. Since a relation occurrence has a particular limited position
2.8 Relations 51

in time, it is an event, according to our definition. This is a spatial event if at least


one of o or c has a particular position in space. This kind of event can be called
relation event.
The possible types of spatial and temporal relations are considered in the
literature on temporal and spatial reasoning (e.g. Allen 1983; Egenhofer 1991;
Frank 1992) and on geographic information systems (e.g. Jones 1997; Longley
et al. 1999). The basic types of temporal relations include binary topological,
ordering, and distance relations. The basic types of spatial relations include
binary topological, directional, and distance relations. Topological and order-
ing relations are formally represented by predicates, that is, boolean-­valued
functions P  ×  Q  → {true, false}, where P and Q denote two sets of entities
the elements of which may be related or not; particularly, P and Q may be
the same. Distance relations can be represented by numeric-valued functions
P × Q → [0, ∞] expressing spatial or temporal distances in suitable units, for
example, metres or seconds. Directional spatial relations can be represented by
a numeric function representing the spatial direction, for example, in degrees.
Directional and distance relations can also be represented qualitatively, that
is, by predicates such as “near”, “far”, “north” (Frank 1992). In fact, any such
predicate stands, explicitly or implicitly, for a certain range of values of a
numeric function. Reciprocally, for any range (or, more generally, subset) of
values of a numeric function, one may introduce a predicate. Hence, we assume
that distance and direction relations can always be represented by a set of predi-
cates defined according to the specifics of the application domain and the goals
of the analysis.
From the basic types of relations, more complex types of relations are built
such as density (concentration, dispersion), arrangement (e.g. sequence in time
or alignment in space), and spatio-temporal relations, which are composed of
spatial and temporal relations. Spatio-temporal relations are particularly rel-
evant to moving objects. They may encapsulate changes of spatial relations of
movers to context elements (locations, static spatial objects, events, and other
movers) over time: approaching or going away, entering, exiting, or passing
by, following, keeping distance, concentrating or dissipating, etc. For spatio-
temporal relations among movers, some researchers use special terms “relative
motion patterns” (Laube et al. 2005), “movement patterns” (Dodge et al. 2008;
Laube 2009), or “interactions” (Orellana and Renso 2010). We consider the term
“movement pattern” or “motion pattern” as over-generic and the term “interac-
tion” as more specific than our “relation”. We shall use the term interaction to
denote such a relation between two or more spatial objects when the objects
are so close in space that they may have “mutual or reciprocal action or influ-
ence” (Merriam-Webster 1999) upon one another. Thus, relation “100 km apart”
is typically not considered as interaction while relation “100 m apart” can be
considered as interaction in application to ships in a sea and rather as absence of
interaction in application to people.
Relations of moving objects, including complex spatio-temporal relations, can
be described by referring to the movement events of the objects. Each movement
52 2  Conceptual Framework

Table 2.3  Expressing relations of movers through relations of movement events


Relations of movers Relations of moving events
Mover m is IN place p during Movement event e = (m, [t1, t2], s, a) is IN place p, that is,
time [t1, t2] IN(s, p) = true
Mover m PASSES BY spatial Movement event e = (m, [t1, t2], s, a) is NEAR
object o during time [t1, t2] object o and
 the previous  and next events
ep =  m, t1p , t2p , sp , ap , where t2p < t1 , and
en = m, t1n , t2n , sn , an , where t1n > t2 , are NOT
NEAR object o (i.e. NEAR(s, o) = true and
NEAR(sp, o) = false and NEAR(sn, o) = false)
Mover m1 FOLLOWS mover m2 For the sequences of elementary movement events of the 
during time [t1, t2] two movers ei1 = m1 , t i , s1i , a1i and i i i i
  
 e2 =i m2 , t , s2, a2 i
that occurred during time [t1 , t2 ] i.e., ∪t = [t1 , t2 ] , e1
is DIRECTED TOWARDS ei2 (i.e. the movement
direction from s1i to s1i+1 EQUALS the direction from s1i
to s2i ) and ei2 is NOT DIRECTED TOWARDS ei1
Movers m1 and m2 CONVERGE For the sequences of elementary movement events of the 
during time [t1, t2] two movers ei1 = m1 , t i , s1i , a1i and i = m , t i , si , ai
  
e
 2 i 2 2 2
that occurred during time  [t1 , t2 ] i.e.,
 ∪t = [t1 , t2 ] ,
SPATIAL_DISTANCE e1 , e2  < SPATIAL_
i+1 i+1

DISTANCE ei1 , ei2 and elast is NEAR elast where elast


 
1 2 , 1
and elast
2
are the last events in the interval [t1, t2]

event e  = (m, t, s, a) has its relations to the context, which may be represented
as a set of relation events (R, e, c, [t1, t2]). The interval [t1, t2] is related to the
life time t of the event in the following way. If t is an instant, t1  =  t2  =  t. If t
is a time interval, [t1, t2] either coincides with t or is a subinterval of t. If e is a
composite event, (R, e, c, [t1, t2]) means that each elementary event e′  = (m, t′,
s′, a′) with t′ ∈ [t1, t2] is linked to the context element c by the relation R, that is,
R(e′, c) = true during t′.
Table 2.3 gives a few examples of expressing relations of movers through rela-
tions of their movement events.
Relations of movers to the spatio-temporal context can also be represented by
dynamic attributes, such as the following ones:
• spatial distance:
– to a selected SCE;
– to the nearest or the nth nearest SCE;
– to the nearest or to the nth nearest STCE within a given temporal window;
• spatial direction:
– in relation to a selected SCE (i.e. the angle between the movement vector of a
mover and the vector directed to the current position of the SCE);
– in relation to the direction of a selected mover (i.e. the angle between the
movement vectors of two movers);
• temporal distance:
– to a selected TCE;
2.8 Relations 53

– to the nearest or to the nth nearest TCE;


– to the nearest or to the nth nearest STCE within a given spatial window;
• neighbourhood:
– count of SCE within a given spatial window;
– count of TCE within a given temporal window;
– count of STCE within given spatial and temporal windows.
We remind the reader that the abbreviations SCE, TCE, and STCE denote spa-
tial, temporal, and spatio-temporal context elements, respectively. A temporal window
is specified relatively to the time moment t for which the attribute value is com-
puted, for example, last 10 min (i.e. from t − 10 min to t), from t − 5 to t + 5 min,
from tstart (the start time of the trajectory) to t − 5 min. A spatial window is speci-
fied relatively to the spatial position attained at the moment t, for example, within
500 m distance, more than 500 m distance, within 500 m to the north.

2.8.2 Relations of Locations and Times

Locations (elements of space S) are linked by spatial relations to spatial objects.


Time units (elements of T) are linked by temporal relations to temporal objects
(events) . Locations by themselves do not have positions in time and, hence, direct
relations to times. Likewise, times by themselves have no direct relations to spatial
locations. Figure 2.7 graphically summarizes the possible types of direct relations
between locations, times, and objects.
However, locations and times can be linked indirectly by means of spatio-
temporal objects (spatial events and movers). Thus, when an object appears in a
given location, it links this location to the time of the appearance. Reciprocally,
the object appearance time becomes linked to the location. On this basis, one can
define such kind of relation as linked-by-object (l, t) and its variants linked-by-any-
object and linked-by-X, where X stands for a specific object.
Obviously, locations are linked to other locations by spatial relations and times
are linked to other times by temporal relations. These relations are independent
of moving objects or events and are not of primary interest in the context of this

Spatio-temporal object
Spatial position Trajectory Temporal position

Spatial relations Temporal relations

Locations Times

Fig. 2.7  Direct relations between the sets of locations, times, and objects


54 2  Conceptual Framework

book. We are more interested in the kinds of relations that involve spatio-temporal
objects. Moving objects visit different locations and thereby link these locations.
The kinds of relations that emerge between locations due to object movement
include connectedness relations and temporal relations.
Connectedness relations are based on set relations between the sets of movers
visiting the locations, that is, include, overlap, coincide, and disjoint. For exam-
ple, two locations are connected if the sets of their visitors overlap and uncon-
nected if these sets are disjoint. Variants such as strongly connected and weakly
connected can be defined by setting a lower or upper limit on the number of com-
mon visitors. More generally, the number of common visitors can be considered as
an attribute of a connectedness relation expressing the strength of the connection.
Furthermore, we can define a relation of connectedness by a set (group) of
moving objects: locations l1, l2, … are connected by a set of movers M′ if each
location was visited by each of the movers m ∈ M′. The cardinality of the set M′
can be used as a measure of the strength of the location connectedness.
Temporal relations between locations are based on the temporal relations
between the visits of the locations by movers, more precisely, between the times
of the visits. In particular, locations may be linked by temporal ordering and tem-
poral distance relations. A trajectory of a single mover puts the visited locations in
a particular temporal order. Obviously, different trajectories may impose different
orderings on the same locations. However, stable ordering relations are also possi-
ble. For example, relation between (l1, l2, l3), that is, location l1 is visited between
location l2 and location l3, may be considered permanent if all objects moving
from l2 to l3 intermediately go through l1.
More generally, we can say that locations l1, l2, … are linked by temporal
relation (TR) imposed by a group (set) of moving objects M′ if for each object
m ∈ M′ the times t1, t2, … of the visits of the locations l1, l2, … are related by TR:
TR(t1, t2, …). For example, before (M′, l1, l2) means that each mover m ∈ M′ vis-
ited location l1 before visiting location l2.
Mover-imposed relations of linear temporal order between locations will be
called flow relations, and instances (occurrences) of such relations will be called
flows. The notation flow (M′, l1, l2, …) may be used to denote that each object
m ∈ M′ visited the locations l1, l2, … in the given temporal order. The cardinality
of the set of movers imposing a flow relation between locations may be considered
as an attribute of this relation, which may be called flow magnitude. Based on the
magnitudes, flows can be classified into strong and weak (by choosing a threshold
value). The introductory section contains examples of flow maps, which represent
flow relations between locations. For better readability of flow maps, weak flows
are often hidden.
Attributes of relations between locations, such as connectedness strength and
flow magnitude, may be dynamic, that is, change over time. Generally, we assume
that any relations may have static and dynamic attributes.
Relations between time units may be defined in terms of existence and spatial
positions of spatio-temporal objects. In particular, equivalence relations represent
constancy over time. Thus, two time units are equivalent with regard to a given set
2.8 Relations 55

of objects O′ if each object o ∈ O′ either exists or does not exist in both time units
and the spatial positions of the existing objects are the same. Absence of equiva-
lence means change. Change between time units can be characterized by attributes
such as the number of objects that appeared or disappeared, the number of objects
that moved, the distance on which they moved (e.g. average or total distance).
Movers change their spatial positions over time and thereby can link time units
by spatial relations. The spatial distance relation between two time units t1 and
t2 with regard to a set of moving objects M′ may be defined as an aggregate (e.g.
average) of the distances between the spatial positions of the objects in time t1
and in time t2. The spatial direction relation may be defined as an aggregate (e.g.
­average or mode) of the directions between the spatial positions of the objects in
time t1 and in time t2.
Furthermore, spatio-temporal trends in terms of positions of moving objects,
such as converging, concentrating, dissipating, aligning, shifting-to (particular
spatial direction), can be viewed as relations between time units. These complex
relations are more difficult to define formally; however, they can be relatively eas-
ily detected visually by observing an animated map or a space–time cube.

2.9 Movement Data and Context Data

2.9.1 Forms and Sources of Movement Data

As stated in Sect. 2.4, movement data represent the movement function μ:


M × T → S. The most typical format of movement data is a set of position records
having the structure <mover identifier, time unit, spatial position>. This structure can
also be represented by the formula M × T → S, which emphasizes that the objects
and time units may be, in principle, chosen arbitrarily whereas the spatial position is
a measured value depending on the chosen pair of object and time unit. The records
may additionally include values of thematic attributes, that is, the structure may be
M × T → S × A, where A stands for thematic attributes. A time-ordered sequence
of position records of one object is called movement track of this object.
Movement data may also be available in the form <mover identifier, trajec-
tory>, where the trajectory specifies the mapping τ: T  →  S, for instance, by a
sequence of pairs <time unit, spatial position> (in principle, other representations
are possible, for example, a sequence of geometric primitives). This form may be
encoded as M → (T → S × A). It is equivalent to M × T → S × A.
The possible methods of position recording include (Andrienko et al. 2008) the
following:
• Time-based: positions of movers are recorded at regularly spaced time
moments.
• Change-based: a record is made when mover’s position, or speed, or movement
direction differs from the previous one.
56 2  Conceptual Framework

• Location-based: a record is made when a mover enters or comes close to a spe-


cific place, for example, where a sensor is installed.
• Event-based: positions and times are recorded when certain events occur, in par-
ticular, when movers perform certain activities such as mobile phone calling or
taking photos.
• Various combinations of these basic approaches. In particular, GPS tracking
devices may combine time-based and change-based recording: the positions are
measured at regular time intervals but recorded only when a significant change
of position, speed, or direction occurs.
Movement data are not always originally available in the form of position
records. There may be other possible formats such as
• Data about visits of predefined places (presence dynamics), for example, time
series of activations of spatially distributed motion sensors.
• Series of spatial situations (snapshots), for example, video observations data.
Data in these formats often do not contain explicit information about movers
and their trajectories. Special methods are needed to reconstruct this information,
for example, as the methods suggested by Ivanov et al. (2007) and Höferlin et al.
(2009).

2.9.2 Properties of Movement Data

In analysing movement data, it is important to take into account the following


properties:

• Temporal properties:
– temporal resolution: the lengths of the time intervals between the position
measurements;
– temporal regularity: whether the length of the time intervals between the
measurements is constant or variable;
– temporal coverage: whether the measurements were made during the whole
time span of the data or in a sample of time units, or there were intentional or
unintentional breaks in the measurements;
– time cycles coverage: whether all positions of relevant time cycles (daily,
weekly, seasonal, etc.) are sufficiently represented in the data, or the data
refer only to subsets of the positions (e.g. only to work days or only to day-
time), or there is a bias towards some positions.

• Spatial properties:
– spatial resolution: the minimal change of position of an object that can be
reflected in the data;
2.9  Movement Data and Context Data 57

– spatial precision: whether the positions are defined as points (by exact coor-
dinates) or as locations having spatial extents (e.g. areas). For example, the
position of a mobile phone call is typically a cell in a mobile phone network;
– spatial coverage: are positions recorded everywhere or, if not, how are the
locations where positions are recorded distributed over the studied territory
(in terms of the spatial extent, uniformity, and density)?

• Mover set properties:


– number of movers: a single mover, a small number of movers, and a large
number of movers;
– population coverage: whether there are data about all movers of interest for
a given territory and time period or only for a sample of the movers; in the
latter case, what is the relative size of the sample with respect to the whole
population of interest;
– representativeness: whether the sample of movers is representative, that is,
has the same distribution of properties as in the whole population, or biased
towards individuals with particular properties.
• Data collection properties:
– position exactness: How exactly could the positions be determined? Thus, a
movement sensor may detect an object within its range but may not be able to
determine the exact coordinates of the object. In this case, the position of the
sensor will represent the position of the object in the data;
– positioning accuracy, or how much error may be in the measurements. For
example, GPS positioning may have an error between 1 and 10 m, but it can
increase up to 30 m in mountain areas or near high buildings. Sometimes, it is
possible to decrease the measurement errors by taking into account physical
constraints, for example, the street network;
– missing positions: in some circumstances, object positions cannot be deter-
mined, which leads to gaps in the data. For example, GPS positioning does
not work in buildings and in tunnels. Some positions may be intentionally
removed from data for preserving privacy or secrecy;
– meanings of the position absence: whether absence of positions corresponds
to stops, or to conditions when measurements were impossible, or to device
failure, or to private information that has been removed.
These properties of movement data are quite strongly related to the data col-
lection methods. Thus, only time-based measurement produces temporally regu-
lar data. The temporal resolution may depend on the capacities and/or settings of
the measuring device. GPS tracking, which may be time-based or change-based,
gives very high spatial precision (the positions are defined as points) and quite
high accuracy while the temporal and spatial resolution depends on the device
settings. The spatial coverage of GPS tracking is very high (almost complete) in
open areas. Location-based and event-based recordings usually produce tempo-
rally irregular data with low temporal and spatial resolution and low spatial cov-
erage. The spatial precision may be high (e.g. for positions of taking photos by a
58 2  Conceptual Framework

GPS-enabled camera or mobile phone) or low (e.g. for positions of taking photos
that are specified manually by the authors, for mobile phone calls).
Irrespectively of the collection method and device settings, there is also indis-
pensable uncertainty in movement data (and, more generally, any time-related
data) caused by their discreteness. Since time is continuous, the data cannot refer
to every possible instant. For any two successive instants t1 and t2 referred to in
the data, there are moments in between for which there are no data. Therefore, one
cannot know definitely what happened between t1 and t2.
Movement data with fine temporal and spatial resolution give a possibil-
ity of interpolation, that is, estimation of object positions between the measured
positions. In this way, the continuous path of the mover can be approximately
reconstructed, as we did in the examples in Chap. 1 (we used a simple linear inter-
polation). Therefore, movement data allowing interpolation between known posi-
tions may be called quasi-continuous.
Movement data that do not allow valid interpolation may be called episodic.
Episodic data are usually produced by location-based and event-based collec-
tion methods but may also be produced by time-based methods when the position
measurements cannot be done sufficiently frequently, for example, due to the lim-
ited battery lives of the devices. Thus, when tracking movements of wild animals,
ecologists have to reduce the frequency of the measurements in order to be able to
track the animals during longer time periods.
Another set of properties of movement data is related to the physics of the mov-
ing objects and the character of their movement. These properties seriously affect
the choice of the methods for data preprocessing, transformation, visualization,
and analysis:
• Whether the positions can be considered as two-dimensional or the third dimen-
sion (altitude or depth) is essential. The third dimension can be essential not
only for data about air or underwater movement but also for data about move-
ment in a mountainous area or in a multi-level building. Three-dimensional
movement data require visualization and analysis methods that can properly
deal with the third spatial dimension.
• Whether the data represent constrained or free movement. When the movement
is constrained, for example, by a street network, there are better possibilities for
detecting and correcting positioning errors and for reducing position uncertain-
ties. Repeated patterns (e.g. frequently followed routes) are more likely to occur
in constrained movement than in free movement.
• Whether the movement may contain abrupt changes of the spatial position in very
short time. For example, a pertinent property of eye movements is the presence
of instantaneous jumps (saccades) over relatively long distances (Dodge et al.
2009). The intermediate points between the start and end positions of a jump are
not meaningful: it cannot be assumed that there exists a straight or curved line
between two fixation positions such that the eye focus travels along it attending all
intermediate points. This prohibits the use of any method involving interpolation
between positions, for example, creation of movement density surfaces.
2.9  Movement Data and Context Data 59

2.9.3 Context Data

Movement data containing trajectories of multiple objects describe simultaneously


the movement of each object and the part of its spatio-temporal context consisting
of the movements of the other objects. Besides, for each movement event, the move-
ment data describe the part of its context consisting of the other movement events.
Context data describing other parts of the spatio-temporal context of the move-
ment may have diverse forms depending on the nature of the respective context
elements.
Data about spatial events that do not change their spatial positions have the
general structure <event identifier, temporal position, spatial position, values of
thematic attributes>, represented by the formula E → T × S × A. In Sect. 2.4, this
type of spatio-temporal data was called spatial event data. For non-spatial events,
the data do not have the component representing the spatial position, that is, the
formula is E → T × A (event data).
Static characteristics of locations may be described by data in the format
S  →  A and dynamic characteristics by data in the format S  → (T  →  A), that
is, by spatial time series (see Sect. 2.4). The format S  → (T  →  A) is equivalent
to S  ×  T  →  A, which means that attribute values are specified for various pairs
<place, time>. Characteristics of spatial objects are described by data formats
O → A and O → (T → A), where O denotes spatial objects.
Characteristics of time units that do not depend on locations can be described by
data in the format T  →  A. Location-dependent attributes, such as weather, can be
represented by the structure T → (S → A), which is equivalent to T × S → A or
S × T → A. Hence, the same data structures can be used to represent time-depend-
ent characteristics of places and space-dependent characteristics of time units.
Context data are not always available. Even when some context data are available,
they typically do not fully describe the environment. Therefore, analysis of movement
data requires the involvement of analyst’s background knowledge. The knowledge
may be involved implicitly, when the analyst interprets the data and analytical artefacts
obtained, or explicitly, when the analyst constructs new data to be used in the further
analysis. Visualization and interactive techniques are required in both cases.

2.10 Example Data Sets Used in the Book

2.10.1 Personal Driving

The personal driving data, which have been already introduced in Chap. 1, were
collected by a single person and provided for research purposes. The data set cov-
ers the time period from the 4th of December 2006 till the 13th of October 2007
and contains 112,890 position records. The positions were measured by a GPS
device installed in the person’s car; hence, the data contain only trajectories of this
60 2  Conceptual Framework

car and do not reflect other movements of the person. The GPS device was always
switched off when the person left the car; hence, significant stops are manifested
by time gaps between the position records. Unfortunately, the beginning parts of
many of the recorded trips are missing: after the device had been turned on, it took
some time to establish connections to GPS satellites and obtain coordinates.
The time intervals between the position records are irregular. The data col-
lection method was change-based, that is, the GPS device recorded the positions
when the car changed its speed or direction. The temporal resolution of the data
is quite fine: the median length of the time interval between consecutive position
records is 2 s, and the average length is 4.5 s. Both the median and average spa-
tial distances between consecutive positions are 0.03 km. Since the temporal and
­spatial distances between the consecutive positions are quite small, the personal
driving data set is a strong example of quasi-continuous movement data.

2.10.2 Cars in Milan

The Milan cars data also were introduced in Chap. 1. The data were collected by
GPS tracking of 17,241 cars in Milan (Italy) during one week from Sunday, the 1st
of April, to Saturday, the 7th of April, 2007. The data have been kindly provided
by Comune di Milano (Municipality of Milan).
The data set consists of more than 2 million records each including car identifier,
time stamp (date and time of the day), geographical coordinates, and speed. We have
no information about the data collection method; most probably, it was change-based,
like for the personal driving data set. The time intervals between the records of the same
car are almost regular, ranging from 30 to 35 s, whereas there are also larger intervals
ranging from several minutes to several days. The median and average temporal dis-
tances between consecutive position records are 35 and 2,295 s (38 min), respectively.
Evidently, the GPS devices were switched off when the cars stopped (this could be done
automatically when the motor was turned off). The median and average spatial dis-
tances between consecutive positions are 0.49 and 0.72 km, respectively.
Although the temporal resolution of the Milan cars data is much lower than in
the personal driving data, the data can still be treated as quasi-continuous: when
we put the positions on a map and connect them by lines, the resulting trajectory
lines mostly fit quite well the street network of Milan (see Fig. 1.21 right).

2.10.3 Vessels in the North Sea

The data concerning vessel movements have been collected by the Netherlands
Coastguard by means of marine radars and automatic identification system (AIS),
which is an automatic tracking system used on ships and by vessel traffic ser-
vices for identifying and locating vessels. MARIN (Maritime Research Institute
Netherlands, www.marin.nl) receives the fused data for use in safety assessment
2.10  Example Data Sets Used in the Book 61

studies with respect to shipping. MARIN has provided an anonymized subset of


the data, 8 days duration, for our research. The authors are especially grateful to
Y. Koldenhof (MARIN) for describing the analytical tasks in marine safety studies
and providing feedback on the application of visual analytics methods.
The AIS data collection method is time-based, but the time intervals between the
records are variable: an AIS transceiver sends the data every 2–10 s depending on
a vessel’s speed while underway, and every 3 min while a vessel is at anchor. The
data records include the ship identifiers, positions, courses, and speeds. As the tem-
poral resolution is quite fine, the data are quasi-continuous. The spatial resolution,
precision, accuracy, and coverage of AIS data are very high. Most vessels use GPS
receivers for determining their spatial positions. A less frequently used positioning
system is based on radio signals transmitted by fixed land–based radio beacons.

2.10.4 Public Transport in Helsinki

We have gathered traffic data using the Helsinki Regional Transport’s HSL Live
web service [http://developer.reittiopas.fi/pages/en/other-apis.php]. A request was
made to the HSL Live URL once every 3 s, for the duration of 24 h. The response
to this request was a list of the locations of all active buses and trams within a
given bounding box. We parsed and saved these locations together with the route
number and vehicle licence number.
The collected data cover more than a hundred vehicles on 16 tram and 8 bus
routes. We have preprocessed the resulting log file for cleaning and consolidating
and subdivided the vehicle tracks into trajectories representing the trips between
the route origins and destinations. The final data set contains more than 800 trajec-
tories with approximately 1,000,000 position records.
The data collection method was time-based. The data have regular time inter-
vals (3 s) between the recorded positions. Due to the fine temporal resolution, the
data are quasi-continuous.

2.10.5 A Group Walk of Workshop Participants

During a scientific workshop dedicated to collection and analysis of mobility data,


the participants were suggested to go for a joint walk in a park and track their move-
ments while walking. Twelve participants volunteered to do so. They were sup-
plied with GPS devices. The data collection method was time-based. The positions
were recorded at regular time intervals. For all but one participant, the time inter-
vals between the records were 5 s. For the remaining participant, who used another
model of a tracking device, the intervals were 1 s long. The walk lasted for about
1 h. The participants agreed to provide the collected data for research purposes.
The collected data are quasi-continuous, having high temporal and spatial reso-
lution, precision, and accuracy.
62 2  Conceptual Framework

2.10.6 Trajectories of Flickr and Twitter Users

The source of the Flickr data is the geographically referenced photos from the
Flickr photo-sharing Web site. The geographical positions are specified by the
photo owners when they post the photos in Flickr, or the positions are taken from
the metadata of photos taken by GPS-enabled cameras and smart phones. The
times when the photos were taken can be retrieved from the metadata of the image
files. Many Flickr users repeatedly post their georeferenced photos taken in differ-
ent places and at different times. The geographic locations and times of the pho-
tos reflect the movements of the photographers. The time-ordered sequence of the
records of one Flickr user can be treated as the trajectory of this user.
The Flickr photos data have been downloaded from the Flickr Web site using
a publicly available API and an approach similar to Web crawling. The data are
stored in a relational database. The records include the coordinates and the time
stamps of the photos, the identifiers of the Flickr users who published the photos,
the URLs of the images, and their titles provided by the owners. In our examples,
we shall use data subsets covering selected territories.
Twitter is an online microblogging service that enables its users to send and
read short text messages, known as “tweets”. The service allows the users to attach
geographical references to their tweets, that is, to record their physical location at
the moment of tweeting. This can be done automatically when tweets are posted
from GPS-enabled mobile devices, such as laptop computers, handheld comput-
ers, and cellular phones. Although a very small percentage of tweets are geocoded
(about 1 % worldwide, but the proportions differ in different countries), the abso-
lute amount of georeferenced tweets is very large.
Geographically referenced tweets are gathered through an interface (API)
provided by the Twitter service itself. Twitter puts a limitation on the number of
tweets per day that can be obtained free of charge. However, the current amounts
of georeferenced tweets almost fit in this limit; it is estimated that only about
5–6 % of them are lost. Hence, the set of georeferenced tweets collected in this
way can be regarded as a sufficiently representative sample.
Analogously to georeferenced Flickr photos, trajectories of Twitter users gener-
ating georeferenced tweets can be constructed from the positions of the tweets. In
our book, we shall use a subset of trajectories limited within selected geographical
area and time interval.
The data collection method for the Flickr and Twitter data is event-based: the
position of a person is recorded when an event of taking a photo or posting a mes-
sage occurs. The movement data are episodic since the time intervals between the
recorded positions are often quite long, and therefore, the intermediate positions of
the individuals cannot be determined. The positioning accuracy of the data should
be assumed as low since the positions are not always measured by high-quality
hardware but specified approximately by the authors of the photos or messages.
2.10  Example Data Sets Used in the Book 63

The texts of the photo titles in the Flickr data set and messages in the Twitter
data set often refer to objects and events occurring in the world where the Flickr and
Twitter users live and move. Therefore, these texts can be viewed as context data.

2.10.7 VAST Challenge 2011

VAST is a conference (formerly symposium) on Visual Analytics Science and


Technology taking place annually since 2006. The VAST Challenge is a competi-
tion where the participants analyse a given data set or several related data sets with
the purpose to answer a set of questions about the events and phenomena reflected
in the data. The data are synthetic but synthesized on the basis of real data. The
whole challenge consists of several mini-challenges and one grand challenge.
Each mini-challenge refers to one aspect of a big problem while the grand chal-
lenge combines all aspects.
In our book, we shall use a data set from Mini Challenge 1 of the VAST
Challenge 2011. The data set imitates Twitter data. It contains 1,023,057 geo-
graphically referenced microblog messages. Each record includes the personal
identifier of the message author, creation date and time, location (longitude and
latitude coordinates of the mobile device at the time of posting the message), and
the text of the message. The messages refer to a fictitious metropolitan area of
Vastopolis and time period from the 30th of April to 20th of May, 2011, that is,
21 days.
The scenario reflected in the data is an outbreak of an epidemic disease.
Observed symptoms are largely flu-like and include fever, chills, sweats, aches and
pains, fatigue, coughing, breathing difficulty, nausea and vomiting, diarrhoea, and
enlarged lymph nodes. These symptoms are reflected in some of the microblog
messages.
The tasks of the Mini Challenge were the following:
1. Identify approximately where the outbreak started and outline the affected area
on the map of Vastopolis.
2. Identify how the infection is transmitted: from person to person, by air, by
water, etc.
The properties of the Mini Challenge microblog data are the same as the prop-
erties of the Flickr and Twitter data. Additional context data were provided about
the weather (particularly, wind direction) during the time of the outbreak.

2.10.8 Tracks of Wild Animals in a National Park

The data set was collected by GPS tracking of 72 roe deer and three lynxes in
the Bavarian Forest National Park (Bayerischer Wald) in Germany. The animals
64 2  Conceptual Framework

wear special collars with devices that measure the positions at chosen time inter-
vals and transmit the measurements via radio networks (the data collection method
is described at the URL http://www.luchserleben.de/?lang=2). Unfortunately,
the amounts of data that can be collected are strongly limited by the battery lives
of the tracking devices. Thus, a collar suitable for roe deer can collect about
3,500 positions and a collar suitable for lynxes about 1,200 positions. In order to
track animals over longer time, the researchers increase the time intervals between
the position measurements. Therefore, the collected records are quite sparse in
time. Furthermore, transmitted measurements are often lost; hence, the time inter-
vals between the records may be irregular, and large temporal gaps may occur.
These problems are typical for data obtained by tracking wild animals.
The data collection method for the Bavarian Forest data was time-based. The
animal researchers specified the time intervals at which the positions of the ani-
mals were recorded. The researchers could vary the time intervals by sending spe-
cial signals to the tracking devices. During the observation time, they sometimes
increased the temporal frequency of the measurements for short periods. As a
result, the time intervals between the records are irregular. Due to the low tempo-
ral resolution, the data are episodic.
The Bavarian Forest data set contains 90,571 position records for the roe deer
and 2,604 for the lynxes within the period from 11/12/2004 to 21/01/2009. The
time spans of the data about individual animals vary from 5 to 1,077 days. The
time intervals between the position records vary from a few minutes to several
months; the median interval length is about 5 h for the roe deer. For the three
lynxes, the time intervals are about 12 h, 45 min, and 24 h. Besides the tracks of
the animals, we also have context data about the land cover on the territory where
the animals live.
The authors would like to thank M. Heurich (Bayerischer Wald) for providing
the data and expert judgements of our analyses.

2.10.9 Movements of Laboratory Mice

The data set consists of positions of 83 mice collected over a time period of several
months. The mice were living in a large multi-level cage and wore radio-frequency
identification (RFID) tags. Twenty seven RFID sensors were installed in different
places of the cage. When a mouse came closer than 3 cm to a sensor, its identi-
fier and the time of the event was logged by the sensor. Hence, the data collection
method is location-based. Knowing the positions of the sensors, trajectories of the
mice could be constructed from the log records. The size of the data set is several
hundred Mbytes.
The spatial precision of the recorded positions is not very high. The sensors
could detect the presence of RFID tags within their neighbourhoods but could not
determine the exact positions of the tags; therefore, the positions of the animals
2.10  Example Data Sets Used in the Book 65

are represented by the positions of the sensors. The data are episodic, since the
positions of the mice when they were not close enough to any of the sensors are
unknown. The time intervals between the records are irregular.
Among the observed mice, some were healthy (wildtype) and the oth-
ers were Alzheimer-transgenic, carrying the Alzheimer disease. The princi-
pal research question is whether there is a significant difference between the
movement patterns of these two groups of mice. Since the typical movement
behaviours differ for male and female mice, the impact of the disease needs to
be studied separately for males and females. The health condition and gender
of each mouse were the context data provided together with the mice move-
ment data (i.e. sensor logs). Besides, we have context data about the sensors:
on which level of the cage each sensor is located and whether it is at a watering
place.
The authors would like to thank M. Kritzler (Institute for Geoinformatics) and
L. Lewejohann (Department of Behavioral Biology) at the University of Muenster
for making the mice data set available.

2.10.10 Movements of Visitors of Car Races

The data about movements of spectators of a car race event were collected
using Bluetooth sensors (Bruno and Delmastro 2003), which were installed in 17
places of interest (POIs) around an area of car races. The POIs included parking
places, entrances to the area and to spectators’ tribunes, the information cen-
tre, places with shops and restaurants, and other attractions. The data were col-
lected during two consecutive days, Saturday and Sunday. The devices having
Bluetooth transceivers, such as mobile phones and digital cameras, which were
carried by the visitors coming close to the sensors, were anonymously regis-
tered by the sensors. In total, the sensors registered 12,185 different devices
and made 792,694 time-stamped records. After cleaning, we have built from
these records discontinuous trajectories reflecting the movements of 9,226
device carriers, which is about 15 % of the total number of the visitors of the
car races area in these two days. The procedure of data collection, cleaning, and
preparation for the analysis is described by Stange et al. (2011).
The data collection method was location-based, as for the laboratory mice data
set. The spatial coverage is low since the sensors were installed in quite a few
places over a large area. Like in the mice data set, the positions of the movers are
represented by the positions of the sensors. The data are episodic since there is no
way to reconstruct the positions of the people when they were not in the vicinity
of any sensor. Even for the positions at the sensors, some records may be missing
because Bluetooth sensors can only capture devices with activated Bluetooth ser-
vices, and the activation status may change while a device carrier moves from one
sensor to another.
66 2  Conceptual Framework

Due to numerous uncertainties, these and similar data may not be suitable for
detailed investigations of movements of individual objects. Aggregation over many
individuals may compensate for missing data and uncertainties in the spatial and
temporal coverage.

2.11 Types of Movement Behaviours

One of possible goals of movement analysis is to understand the movement


behaviour of one or more movers. We use the term movement behaviour to
denote a general way in which an object, a set of objects, or a class of objects
moves in space over time and interacts with the spatio-temporal context. There
are several dimensions for distinguishing types of movement behaviours:
• Individual or collective behaviour:
– individual movement behaviour of one or more movers;
– collective movement behaviour, that is, how multiple movers behave together.
• Specificity or generality in terms of movers:
– specific behaviour(s) of one or more movers or collectives of movers;
– general (typical) behaviour pertaining to a population or class of movers.
• Specificity (locality) or generality in terms of space and/or time:
– specific, or local, behaviour occurring in a given part of space and a
given time interval, under the place- and time-specific conditions and
circumstances;
– general (typical) behaviour, including context-invariant features and general
ways of behaving depending on the spatio-temporal context.
• For collective behaviours, relations between movements of the individuals:
– independent movement behaviours, where movers try to move on their own
in a space shared with others;
– dependent movement behaviour, where (some) movers (sometimes) adapt
their movements to movements of others. This includes, among others, antag-
onistic and competitive movement behaviours;
– coordinated movement behaviour, where all movers move together as one
group or coordinate their movements in other ways, as, for example, in a
dance.
In the introductory chapter, we investigated the individual movement behav-
iour of a single person and the independent collective movement behaviour of a
set of cars in Milan. In the case of the single person, we were interested in the
specific movement behaviour in terms of movers (i.e. the behaviour of this particu-
lar person) but in the general movement behaviour of the person in terms of space
and time. In the case of the multiple cars, we focused on the general movement
2.11  Types of Movement Behaviours 67

behaviour in terms of movers, specific behaviour in terms of space (only on the


territory of Milan), and general behaviour in terms of time.
Biologists studying movements of animals are usually interested in general
movement behaviours of animal species, that is, general behaviours in terms of
movers, space, and time. They may observe specific movement behaviours of
individuals and then try to generalize to general behaviours of species. Thus, the
animal ecologists from the Bavarian Forest (see Sect. 2.10.8) want to know how
roe deer as a species behave under various conditions, in particular, when they are
approached by predators. The data about laboratory mice (Sect. 2.10.9) were col-
lected in order to learn how the general behaviours of Alzheimer-sick mice differ
from general behaviours of healthy mice.
Sport analysts and coaches may be interested in specific individual and c­ ollective
movement behaviours in terms of movers, space, and time. Thus, they may analyse
how a football team and each player behaved in a particular game. They may also
be interested in general movement behaviour of a particular player over a series of
games. In analysing the collective behaviour of a team, the analysts need to consider
the coordinated movement behaviour within the team and the dependent movement
behaviour with respect to the members of the opponent team.

Table 2.4  Types of movement behaviours reflected in the example data sets


Individual or Movers: specific Space/time: Relations in a
collective or general specific or general collective
Personal driving Individual
Cars in Milan Collective General Specific Independent
General
Vessels in the North Sea Collective General Specific Independent
Individuala Specifica General Dependentb
Coordinatedc
Public transport in Collective General Specific Coordinated
Helsinki Individuala Specifica General
A group walk Collective Specific Specific Coordinated
Individual General
Flickr and twitter users Collective General Specific Independent
Individuald General
VAST Challenge 2011 Collective General Specific Independent
Wild animals Collective General General Dependent
Individual
Laboratory mice Individual General General Independent
Visitors of races Collective General Specific Independent
aA need to investigate behaviours of individual vessels or buses can arise in emergency cases or
when the timetable is not fulfilled
bVessels may need to adapt their movements to movements of others for avoiding collisions
cSometimes, two or more vessels move in a coordinated way, for example, a tugboat and a towed

vessel
dAlthough there is a potential opportunity to analyse individual behaviours, it should be avoided

for respecting personal privacy of the people who made their data available for public access
68 2  Conceptual Framework

Table 2.4 indicates the types of movement behaviours that can be studied using
the example data sets listed in Sect. 2.10.
Our concept of movement behaviour differs from the concept adopted by
Parent et al. (2013), who use the term “behaviour” as a synonym for “pattern”
(Dodge et al. 2008; Laube 2009). The latter term roughly corresponds to our con-
cept of relation between movers and the spatio-temporal context, including other
movers; see Sect. 2.8.1 (we prefer to use the term “relation” since “pattern” is too
general). According to Parent et al. (2013), behaviour is defined by a predicate that
says if a given trajectory or set of trajectories shows a particular set of character-
istics. An example is tourist behaviour: a daily trajectory shows the tourist behav-
iour if its beginning point P1 is a place of kind “Accommodation”, it makes at
least one stop in a place of kind “Museum” or “Tourist attraction”, it makes one
stop in a place of kind “Eating place”, and its end point is the same place P1 as its
beginning point. Another example is meet behaviour for a set of trajectories. These
and other behaviours considered by Parent et al. are specific in terms of movers
and local in terms of space and time; more general behaviours are not considered.

2.12 Types of Movement Analysis Tasks

Movement analysis consists of one or more tasks, where a task is a piece of


work aiming to find an answer to some question. Consistently with the ideas of
Bertin (1983) and Peuquet (1994, 2002), types of tasks, or questions, can be dis-
tinguished based on the type of information sought. We define the type of infor-
mation in terms of three characteristics: focus, target, and level, which will be
explained below.
According to the multi-perspective view of movement (Sect. 2.6), four different
foci are possible in analysing movement:
• focus on movers M (mover-oriented perspective):
– characteristics of movers in terms of trajectories and movement-specific the-
matic attributes;
– relations of movers to the spatio-temporal context;
• focus on spatial events E (event-oriented perspective):
– characteristics of spatial events in terms of spatio-temporal positions and
movement-specific thematic attributes;
– relations of spatial events to the spatio-temporal context;
• focus on space S, that is, locations (space-oriented perspective):
– characteristics of locations in terms of presence dynamics and movement-
specific thematic attributes;
– relations of locations to the spatio-temporal context;
• focus on time T, that is, time units (time-oriented perspective):
2.12  Types of Movement Analysis Tasks 69

– characteristics of time units in terms of spatial situations and movement-spe-


cific thematic attributes;
– relations of time units to the spatio-temporal context.
Movement analysis may consist of multiple tasks focusing on different
components of the movement phenomenon. Thus, in the introductory chap-
ter, we tried to gain knowledge about the life and activities of the car owner
by analysing her trajectories (focus on M) and stop events extracted from the
trajectories (focus on E). We also extracted meaningful places of this person
(focus on S) and studied the times of visiting them and the times of different
trips (focus on T).
The target of a task is a particular attribute or relation, for example, speed
dynamics in a trajectory, duration of stop events, temporal order of visiting loca-
tions, closeness of two time units in terms of object positions.
The task level may be elementary (i.e. addressing one or several elements of the
focal set M, E, S, or T) or synoptic (i.e. addressing the focal set as a whole or its
subsets, disregarding individual elements). The synoptic level combines Bertin’s
intermediate and overall levels dealing with subsets and entire sets, respectively.
A task may be elementary with respect to one component of a phenomenon and
synoptic with respect to another component (Andrienko and Andrienko 2006). For
example, the target information may be the overall spatial situation in a particular
time unit or in several time units taken separately. This task is elementary with
respect to time since it requires the consideration of each time unit individually.
The analysis level with respect to the set of objects and set of locations is synoptic
since the task requires the consideration of the whole sets of objects and locations.
Hence, according to the type of information sought, the task type is defined in
terms of the following three components:
• General focus: movers, spatial events, locations, or time units.
• Specific target: particular attribute or particular relation.
• Level: elementary or synoptic.
Fully elementary tasks (i.e. elementary with respect to each component), in
which the analyst strives to find particular facts, can be seen as information retrieval
tasks rather than analysis tasks. They need to be enabled by software tools used for
data analysis, but they are not in the focus of visual analytics. Synoptic tasks with
respect to at least one component require abstraction over a set of facts by means
of analysis and reasoning. Therefore, enabling synoptic tasks belong to the scope of
visual analytics. In accord with the book title, all methods described in our book are
intended for synoptic tasks with respect to at least one component.
Different methods or combinations of methods may be required for fulfilling
different types of analysis tasks. Methods that can be used in movement analysis
can be divided into two major categories:
• Analytical methods, which can provide answers to questions or create represen-
tations enabling the user to find answers;
70 2  Conceptual Framework

• Transformational methods, which can convert available data to a form matching


a particular task or suitable for application of a particular analytical tool.
Various methods and classes of methods will be considered in the following
chapters. In particular, Chap. 3 is dedicated to data transformations. It is followed
by four chapters that present analytical methods grouped according to the four
possible task foci. All methods address synoptic tasks with different targets.

2.13 Recap

Movement is a complex dynamic phenomenon that involves objects, space, and time.
Movement data describe linkages between moving objects, locations, and times occur-
ring in the process of movement. These linkages hold valuable information not only
about the moving objects but also about the locations and times. To uncover various
types of information hidden in movement data, it is necessary to consider the data from
different perspectives.
Movement takes place in a complex and dynamic spatio-temporal context, which
includes heterogeneous properties of spatial locations and times and a variety of objects
existing in the space and/or time and having their specific properties. The context affects
the movement and is affected by the movement. Movement cannot be properly analysed
without regarding the context and looking at various relations that occur between mov-
ing objects and elements of the context. The context can be partly represented explic-
itly by context data, which need to be analysed in combination with movement data.
However, a great part of relevant information about the context exists only implicitly in
the mind of a human analyst as background knowledge. Analytical tools intended for
analysis of movement data must enable analysts to use this knowledge.
Our conceptual framework defines the basic concepts that allow us to think and
speak about movement as a phenomenon and about movement data. The framework
is based on the consideration of three fundamental sets: space, time, and objects. In the
set of objects, we separately consider two types of spatio-temporal objects playing the
most important role in the phenomenon of movement, moving objects (shortly, movers)
and spatial events. We describe the phenomenon of movement in terms of elements of
the four sets and relations within and between the sets and introduce multi-perspective
view of movement consisting of mover-oriented perspective, space-oriented perspec-
tive, time-oriented perspective, and event-oriented perspective.
We define the possible data structures and possible types of analysis tasks,
where both the data structures and the task types correspond to the different per-
spectives of movement. Movement data may take four different forms: trajectory
data referring to movers, spatial event data referring to spatial events, local time
series referring to locations, and spatial distributions referring to time units. Four
large classes of tasks are defined according to the possible foci on movers, spa-
tial events, locations, and times. The classes are subdivided into tasks addressing
the characteristics of the focal entities and tasks addressing their relations to the
2.13 Recap 71

spatio-temporal context. Within each class/subclass, the tasks may have different
targets (particular characteristics or relations). Tasks are also distinguished accord-
ing to the level of analysis, which may be elementary (addressing specific ele-
ments of the sets) or synoptic (addressing the sets or their subsets).
In the following chapters, we shall refer to the defined data structures and task
types in presenting the transformational and analytical tools that can be used for
analysing movement data.

References

Aigner, W., Miksch, S., Schumann, H., & Tominski, C. (2011). Visualization of time-oriented
data. Berlin: Springer.
Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the
ACM, 26(11), 832–843.
Andrienko, N., & Andrienko, G. (2006). Exploratory analysis of spatial and temporal data: A
systematic approach. Berlin: Springer.
Andrienko, N., Andrienko, G., & Gatalsky, P. (2003). Exploratory spatio-temporal visualization:
An analytical review. Journal of Visual Languages and Computing, Special Issue on Visual
Data Mining, 14(6), 503–541.
Andrienko, N., Andrienko, G., Pelekis, N., & Spaccapietra, S. (2008). Basic concepts of move-
ment data. In F. Giannotti & D. Pedreschi (Eds.), Mobility, data mining and privacy:
Geographic knowledge discovery (pp. 15–38). Berlin: Springer.
Andrienko, G., Andrienko, N., Bak, P., Keim, D., Kisilevich, S., & Wrobel, S. (2011a). A con-
ceptual framework and taxonomy of techniques for analyzing movement. Journal of Visual
Languages and Computing, 22(3), 213–232.
Andrienko, G., Andrienko, N., & Heurich, M. (2011b). An event-based conceptual model for
context-aware movement analysis. International Journal Geographical Information Science,
25(9), 1347–1370.
Bertin, J. (1983). Semiology of graphics: Diagrams, networks, maps. University of Wisconsin
Press, Madison in 1967 (Trans. Bertin, J.). Sémiologie graphique: Gauthier-Villars, Paris.
Blok, C. (2000). Monitoring change: Characteristics of dynamic geo-spatial phenomena for vis-
ual exploration. In Ch. Freksa, et al. (Eds.), Spatial cognition II, LNAI 1849 (pp. 16–30).
Berlin Heidelberg: Springer.
Bruno, R., & Delmastro, F. (2003). Design and analysis of a bluetooth-based indoor localization
system. In Proceedings of the 8th IFIP-TC6 international conference on personal wireless
communications (PWC) (pp. 711–725).
Dodge, S., Weibel, R., & Lautenschütz, A.-K. (2008). Towards a taxonomy of movement pat-
terns. Information Visualization, 7(3–4), 240–252.
Dodge, S., Weibel, R., & Forootan, E. (2009). Revealing the physics of movement: Comparing
the similarity of movement characteristics of different types of moving objects. Computers,
Environment and Urban Systems, 33(6), 419–434.
Dykes, J. A., & Mountain, D. M. (2003). Seeking structure in records of spatio-temporal behav-
iour: Visualization issues, efforts and applications. Computational Statistics and Data
Analysis, 43(4), 581–603.
Egenhofer, M. (1991). Reasoning about binary topological relations. In O. Günther & H.-J.
Schek (Eds.), Proceedings of the second symposium on large spatial databases, SSD’91,
(Vol. 525, pp. 143–160) Zurich, Switzerland. Lecture Notes in Computer Science. New York:
Springer.
Frank, A. (1992). Qualitative spatial reasoning about distances and directions in geographical
space. Journal of Visual Languages and Computing, 3, 343–371.
72 2  Conceptual Framework

Hägerstrand, T. (1970). What about people in regional science? Papers of the Regional Science
Association, 24, 7–21.
Höferlin, M., Höferlin, B., & Weiskopf, D. (2009). Video visual analytics of tracked moving
objects 2009: proceedings of workshop on behaviour monitoring and interpretation (BMI
‘09) at international 3D geoinfo workshop, CEUR workshop proceedings (Vol. 541, pp.
59–64).
Ivanov, Y. A., Wren, C. R., Sorokin, A., & Kaur, I. (2007). Visualizing the history of living
spaces. IEEE Transactions on Visualization and Computer Graphics, 13(6), 1153–1160.
Jones, C. B. (1997). Geographical information systems and computer cartography. Harlow:
Longman.
Kraak, M.-J. (2003). The space–time cube revisited from a geovisualization perspective. In
Proceedings of the 21st Iiternational cartographic Conference (pp. 1988–1995), Durban,
South-Africa, 10–16 Aug 2003.
Laube, P. (2009). Progress in movement pattern analysis. In B. Gottfried & H. Aghajan (Eds.),
Behaviour monitoring and interpretation—Ambient assisted living (pp. 43–71), IOS Press.
Laube, P., Imfeld, S., & Weibel, R. (2005). Discovering relative motion patterns in groups of
moving point objects. International Journal of Geographical Information Science, 19(6),
639–668.
Laube, P., Dennis, T., Forer, P., & Walker, M. (2007). Movement beyond the snapshot—Dynamic
analysis of geospatial lifelines. Computers, Environment and Urban Systems, 31(5), 481–501.
Longley, P. A., Goodchild, M. F., Maguire, D. J., & Rhind, D. W. (1999). Geographical informa-
tion systems Vol 1: Principles and technical issues (2nd ed). New York, USA: Wiley.
Miller, H. J. (2005). A measurement theory for time geography. Geographical Analysis, 37,
17–45.
Mountain, D., & Raper, J. F. (2001). Modelling human spatio-temporal behaviour: A challenge
for location-based services. In Proceedings of the 6th internaional. conference on geocompu-
tation, University of Queensland, Brisbane, Australia, 24–26 Sept 2001.
Orellana, D., & Renso, C. (2010). Developing an interactions ontology for characterizing pedes-
trian movement behaviour. In M. Wachowicz (Ed.), Movement-aware applications for sus-
tainable mobility: Technologies and approaches, information science reference (pp. 62–86),
Hershey, PA, USA.
Parent, C., Spaccapietra, S., Renso, C., Andrienko, G., Andrienko, N., Bogorny, V., Damiani,
M. L., Gkoulalas-Divanis, A., Macedo, J., Pelekis, N., Theodoridis, Y., & Yan, Z. (2013).
Semantic Trajectories Modeling and Analysis. ACM Computing Surveys, 45(4) (accepted).
Peuquet, D. J. (1994). It’s about time: a conceptual framework for the representation of tem-
poral dynamics in geographic information systems. Annals of the Association of American
Geographers, 84(3), 441–461.
Peuquet, D. J. (2002). Representations of space and time. New York: Guilford.
Spaccapietra, S., Parent, C., Damiani, M. L., de Macedo, J. A., Porto, F., & Vangenot, C. (2008).
A conceptual view on trajectories. Data & Knowledge Engineering, 65(1), 126–146.
Stange, H., Liebig, T., Hecker, D., Andrienko, G., & Andrienko, N. (2011). Analytical workflow
of monitoring human mobility in big event settings using bluetooth. In Proceedings of the
third international workshop on indoor spatial awareness ISA 2011, Chicago, USA.
Tomaszewski, B., & MacEachren, A. M. (2010). Geo-historical context support for informa-
tion foraging and sensemaking: Conceptual model, implementation, and assessment. In
Proceedings IEEE VAST 2010 (IEEE conference on visual analytics science and technology,
Salt Lake City, Utah, USA, 24–29 October, 2010) (pp. 139–146).
Wood, Z., & Galton, A. (2010). Zooming in on collective motion. In M. Bhatt, H. Guesgen & S.
Hazarika (Eds.), Spatio-temporal dynamics. In Proceedings of workshop 21, 19th European
conference on artificial intelligence (pp. 25–30), Lisbon, Portugal, 16–20 Aug 2010.
Chapter 3
Transformations of Movement Data

Abstract This chapter introduces common transformations that are often used


in analysis of movement data: interpolation and re-sampling, division of tra-
jectories, alignment of temporal and spatial references, derivation of new the-
matic attributes, extraction of movement events, generalization, simplification,
and aggregation. These transformations can adapt available data to the analysis
goals or to specific requirements of the methods that the analyst wants to apply.
Transformations can extract relevant parts of the data or reduce irrelevant details.
Some transformations convert movement data from one form to another, to sup-
port different task foci: movers, spatial, events, space, and time. Data transfor-
mations play an auxiliary role in analysis since they are not intended to provide
answers to questions. Their role is to prepare data for analysis, that is, to convert
data to a form fitting a task or required by an analytical tool. Some transforma-
tions change the structure of the data. In the previous chapter, we have introduced
a formal representation for data structures based on the components O (objects), in
particular, moving objects M and spatial events E, S (space), T (time), and A (the-
matic attributes), see Sects. 2.4 and 2.9. We shall use this formalism in describing
the possible transformations of data structures.

3.1 Interpolation and Re-sampling

In movement data resulting from real measurements (rather than simulations), posi-
tions of movers are given for a limited number of time units, that is, the set T in
M → (T → S) is finite and often quite small. Interpolation is the estimation of the
spatial positions of movers in intermediate time units between the measurements,
which increases the cardinality of the set T in M → (T → S) while the data structure
is preserved. Interpolation is often needed for re-sampling, that is, obtaining posi-
tion records for regularly spaced time moments with a desired constant temporal
distance between them. Constant time intervals between positions are necessary for
some analytical methods.

G. Andrienko et al., Visual Analytics of Movement, 73


DOI: 10.1007/978-3-642-37583-5_3, © Springer-Verlag Berlin Heidelberg 2013
74 3  Transformations of Movement Data

Interpolation can be purely geometric, for example, linear or based on Bezier


curves (Macedo et al. 2008), or take into account the nature of the movement
(e.g. cars move on streets) and context data such as street network. The problem
of matching trajectories of vehicles and pedestrians to the street network, called
map matching, is extensively addressed in the research literature (e.g. Quddus et
al. 2007; Lou et al. 2009; Yuan et al. 2010).
As we discussed in Sect. 2.9, not all movement data allow valid interpolation.
Interpolation is usually not possible when the temporal intervals between the
recorded measurements are large and there is no additional information about the
probable movement of the objects between the known positions. In such cases,
analysis methods based on interpolation or requiring previous re-sampling of
movement data are not applicable.

3.2 Division of Movement Tracks and Trajectories

Trajectory data usually contain a single movement track for each moving object.
The track represents the movement during the whole lifetime of the mover or the
whole time span of movement observation. Before starting to analyse trajectory
data, it may be reasonable to divide the movement tracks into suitable parts (tra-
jectories). For example, in analysing data about movements of people, tracks may
be divided into trajectories corresponding to trips between significant places. In
studying seasonal migration of animals during several years, it may be meaningful
to consider trajectories corresponding to the migration seasons.
The possible ways of dividing movement tracks into trajectories include
• by a temporal gap (large distance in time) between two consecutive position
records;
• by a spatial gap (large distance in space) between two consecutive position
records;
• by a certain kind of event, for example, long stop (i.e. the duration is above a
threshold) or visit of a specific place;
• by a specific time moment, such as the date of migration season start;
• by temporal cycles (daily, weekly, yearly, etc.): the beginning of the next cycle
(i.e. day, week, year, etc.) ends the previous part and begins a new one;
• by values of one or more thematic positional attributes: the analyst specifies
equivalence classes for the attribute values (i.e. value intervals for numeric
attributes and individual values or value subsets for non-numeric attributes); a
trajectory is divided at each point where the value class of at least one attribute
changes.
The resulting data structure may be represented as
M → ((T1 → S) ∪ (T2 → S) ∪ · · · ∪ (Tk → S)), where T1, T2, …, Tk are non-
overlapping subsets of the original T in M → (T → S). It is, in general, not neces-
sary that the union of T1, T2, …, Tk gives T. Thus, some parts of the tracks may be
3.2  Division of Movement Tracks and Trajectories 75

irrelevant to the analysis goals and therefore discarded. For example, the analyst
may be interested only in the parts of the tracks when the movers actually moved
and not stayed still.
In our example analysis of individual’s movement in Sect. 1.2, we wanted to
divide the movement tracks into trajectories corresponding to trips of the person.
We did this by a temporal gap of 3 h. With this threshold, the movement from the
work to the home with an intermediate stop in a shopping area was considered as
one trip. However, it could also be useful to divide it into two trips: from the work
to the shopping area and from the shopping area to the home. For example, this
might be reasonable if our goal was to analyse all possible routes leading to/from
the shopping areas.
This example demonstrates that division of movement tracks into trajectories is
not done once and forever but can be done repeatedly in different ways depending
on the analysis task. For some tasks, it may also be possible or even necessary to
use undivided tracks. Undivided movement tracks may be called full trajectories.
Furthermore, trajectories resulting from a previous division can be further subdi-
vided into smaller trajectories. For example, we may first divide the track of the
personal car into daily trajectories and use them to find repeatedly visited places of
the person. Then, we may further divide the daily trajectories into trajectories cor-
responding to the trips between these places. Any other methods of division can be
used as well.

3.3 Transformations of Temporal and Spatial References

In the example analysis in Sect. 1.2, we transformed the time references in the tra-
jectories from absolute to relative. We did this for two major purposes:
• to allow more effective visualization and visual comparison of dynamic charac-
teristics of the trajectories, in particular, by means of a space–time cube;
• to study the relations of the movements to the temporal cycles.
We suggest three classes of temporal transformations:
• Transformations with respect to temporal cycles, which include bringing the times
of the trajectories to the same year or season, the same month, week, day, or hour,
or to a domain-specific cycle. This means that the original time references are
replaced by the respective relative positions within the selected temporal cycle.
• Transformations in relation to the individual lifelines of the trajectories, which
include bringing the trajectories to a common start moment, a common end
moment, or common start and end moments. In the first two cases, the lengths
of the time intervals between the positions and, hence, the durations of the tra-
jectories are preserved. In the third case, the lengths of the time intervals are
proportionally modified, so that the transformed trajectories have the same dura-
tion. This may be useful for comparing movements that proceed with different
speeds.
76 3  Transformations of Movement Data

• Transformations in relation to a selected place. All trajectories that visited this


place are shifted in time, so that the time of the visit becomes the same for all
trajectories. This means that the original time references are replaced by their
temporal distances to the time of the visit. The parts of the trajectories preced-
ing the visit will have negative time references, the time of the visit will be
transformed to zero, and the following parts of the trajectories will have positive
time references.

Temporal transformations do not change the structure of the movement data


M → (T → S) but change the set T: the original time units are replaced by relative
temporal positions with respect to a selected temporal cycle or selected times from
the trajectories.
Transformations can also be applied to spatial positions. An analyst may be
not so much interested in absolute positions in space as in relative positions with
regard to a certain place. For example, the analyst may study where a person trav-
els with regard to his/her home or movements of spectators to and from a cinema
or a stadium. In such cases, it is convenient to define positions in terms of dis-
tances and directions from the reference place (or, in other words, by means of
polar coordinates). The directions can be defined as angles from some base direc-
tion, for example, from the northward direction in the geographical space. It may
be useful to treat the space as one-dimensional by transforming the spatial refer-
ences to distances from a particular place or from the start and/or end points of
trajectories. A well-known example of this kind is the graphical schedule of the
train movement between Paris and Lyon designed in 1885 by E. J. Marey and
described by Tufte (1983). In this graph, one display dimension is used to repre-
sent the rail route between Paris and Lyon. For this purpose, the places (train sta-
tions) are projected onto a straight line according to their relative positions on the
route from Paris to Lyon. The second display dimension represents time. The tra-
jectories of different trains are shown as lines, like in a space–time cube; however,
the transformation of the space to one-dimensional removes the necessity of using
2D views of 3D objects with all the disadvantages (occlusions, distortions, false
neighbourhood, etc.) caused by the projection. Generally, comprehensive analy-
sis may require consideration of the same data within different systems of spatial
referencing and, hence, transformation of one reference system to another: geo-
graphical coordinates to polar (with various origins), coordinate-based referencing
to division based or network based (see Sect. 2.2.1), etc. These transformations of
the spatial reference system, in fact, do not change the data: the spatial locations in
S remain the same and only the references change.
However, there are transformations that change the set S. For example, trajec-
tories may be shifted to a common origin in space for a convenient comparison
of their geometric characteristics such as shapes, movement vectors, and lengths.
Moreover, trajectories can also be rotated, so that their movement vectors coin-
cide. This further facilitates the comparison of the shapes and lengths. By such
transformations, the original (geographical, or, more generally, physical) space
is transformed into an abstract space that does not exist in reality. Examples can
3.3  Transformations of Temporal and Spatial References 77

be found in the paper by Kwan (2000). In analysing movements of people, she


­transforms their spatial positions to distances from their homes in order to com-
pare how far different people travel and on what distances from the homes they
work, shop, and perform other activities. In another transformation, the trajectories
of all individuals are shifted, so that the home location becomes the origin (0, 0)
of the coordinate system and the home-work axis is rotated until it becomes the
positive X-axis. Hence, different locations of the original space (such as homes
of different people or different work places located in the same distance from the
homes) may be projected onto the same location in the transformed space.
Generally, trajectory points are transformed to the new coordinate system
depending on their relative positions in the original space with respect to certain
chosen reference points and, possibly, reference vectors. In the original space, dif-
ferent reference points and vectors may be used for different trajectories; in the
new space, these points and vectors are superimposed and unified.
In the transformations, we have discussed so far, the same reference point and
vector are used for all points of one trajectory. In other words, the reference points
and vectors are static, that is, do not change over time. However, it may also be
meaningful to use dynamic reference points and vectors and transform each trajec-
tory point depending not only on its position in the original space but also on its
position in time. Such kind of transformation may be useful, for example, in ana-
lysing collective movement, when multiple individuals move together in a group.
Besides changing their absolute positions in the geographical (physical) space,
the group members may change their relative positions with respect to the other
group members. If these relative positions are of interest, it is beneficial to trans-
form the absolute spatial positions to relative positions within the “group space”:
at the forefront, in the centre, at the rear, on the left, on the right, aside of the oth-
ers, etc. These relative positions need to be independent on the absolute position of
the group in the original space in each time unit. This can be achieved by taking
the position of the group centre c(t) and the group movement vector v(t) in each
time unit t as the reference for transforming the trajectory points that occurred in
this time unit t. A method to compute the group centre and group movement vector
will be presented in detail in Chap. 5.
A point pi(t) of a trajectory ti is projected onto the group movement vector v(t)
by building a perpendicular from pi(t) to v(t) and finding the crossing point pi′(t).
The X-coordinate of the point pi(t) in the group space is defined as the distance
between pi(t) and pi′(t) (i.e. the distance from the point pi(t) to the group move-
ment vector v(t)) with the sign, minus or plus, depending on the position of pi(t)
on the left or on the right of v(t). The Y-coordinate is defined as the distance of
the projection point pi′(t) to the group centre c(t) with the sign, minus or plus,
depending on the position of pi′(t) behind or in front of c(t) along the direction of
v(t). The transformation is done in the same way for the points of different trajec-
tories that occurred in the same time unit.
To illustrate the space transformation to a group space, we shall use the exam-
ple data set collected during a group walk of workshop participants, which is
described in Sect. 2.10.5. Figure 3.1 shows the original trajectories of the group
78 3  Transformations of Movement Data

Fig. 3.1  Trajectories of multiple individuals that walked in a group are visualized on a map (left)
and in a space–time cube (right)

Fig. 3.2  The trajectories of the participants of the group walk have been transformed from the
geographical space to an abstract group space where the origin of the coordinate system is the
group centre and the Y-axis reflects the direction of the group movement vector. The new space
and the transformed trajectories are visualized on a map (left) and in a space–time cube (right)

members in the geographical space. Figure 3.2 shows the same trajectories trans-
formed to the space where the coordinate system is defined in terms of the group
centre (the origin of the coordinates) and the group movement vector (the Y-axis).
On the left, the group space with the transformed trajectories is shown on a map.
The map contains X- and Y-axes; their crossing marks the group centre. The front
part of the group is above the X-axis, and the rear part is below the X-axis. On the
right, the transformed trajectories are shown in a space–time cube. The cube is
turned, so that the Y-axis goes from left to right; hence, the front part of the group
3.3  Transformations of Temporal and Spatial References 79

Fig. 3.3  After transforming trajectories of group members to an abstract group space, subsets of


trajectories can be selected for visual inspection and comparison

is on the right, and the rear part is on the left of the cube view. In this way, it is
possible to find the leaders and those who are behind the others.
Using the transformed trajectories, the movement behaviours of the individu-
als within the group can be investigated and compared more effectively than with
the original trajectories. To reduce the display clutter, one or a few trajectories are
selected for detailed visual inspection, as in Fig. 3.3. For example, we can observe
on the left that the person whose trajectory is coloured in dark blue was in the
front part of the group almost all the time but at the end moved to the back. The
cyan-coloured trajectory line shows that the respective person was initially at the
group centre, then moved to the back, but by the end of the trip moved to the front.
The dark red trajectory line has a long part almost coinciding with a part of the
cyan line, which may mean that the respective individuals walked together during
a long time interval. On the right of Fig. 3.3, we can compare the trajectories of
three persons who tended to take the leading positions in the group. The light blue
trajectory was behind the two others in the first half of the walk but then moved
to the front and mostly kept this position. The dark blue trajectory was more fre-
quently in front of the others in the first half of the walk. There was a time interval
when all three “leaders” were close together, then they separated.
Spatial transformations may change the set of spatial locations S in movement
data, but the data structure M → (T → S) does not change.

3.4 Derivation of New Thematic Attributes

Two kinds of thematic attributes can be computationally derived from trajectory


data: (1) attributes characterizing trajectories; (2) attributes characterizing posi-
tions or segments within trajectories. These two kinds will be further referred to as
trajectory attributes and positional attributes.
80 3  Transformations of Movement Data

One can compute static or dynamic trajectory attributes. Static attributes


c­haracterize whole trajectories irrespective of the time. For obtaining dynamic
attributes, the time span of the data is divided into intervals and the attributes are
computed for the parts of the trajectories made during each interval. As a result,
a time series of attribute values is generated for each trajectory. Any attribute that
can be computed for a whole trajectory can also be computed for trajectory parts.
Potentially useful thematic attributes that can be computed from trajectory data
for whole trajectories or for trajectory parts include, but are not limited, to the fol-
lowing ones:

• travelled distance, that is, length in space;


• duration in time;
• displacement, that is, distance in space between the first and last positions;
• movement vector from the first to the last position;
• spatial extent, which may be represented by the area of the bounding rectangle
or the length of its diagonal;
• average speed (also minimal, maximal, etc.);
• sinuosity: the measure of deviation of the path between the first and last posi-
tions from the shortest possible path (straight line), expressed as the ratio of the
travelled distance to the displacement;
• tortuosity: the measure indicating how tortuous the path between the first and
last positions is. There are several approaches to estimating curve tortuosity.
One of them is to express tortuosity as the ratio of the number of turns to the
path length.

When context data are available, it is possible to compute various static and
dynamic trajectory attributes expressing relations of trajectories to elements of the
spatio-temporal context:

• distance in space to a selected spatial context element (SCE);


• distance in time to a selected temporal context element (TCE);
• relative movement direction with regard to a selected SCE;
• count of SCE of a certain type within the spatial neighbourhood (e.g. within a
given spatial distance from the trajectory);
• count of TCE of a certain type within the temporal neighbourhood (e.g. within a
given temporal distance from the trajectory);
• count of spatio-temporal context elements (STCE) within the spatio-temporal
neighbourhood.

Let Tr be a set of trajectories of moving objects, where each trajectory τ is a


mapping T  →  S. The computation of static trajectory attributes uses the map-
ping T  →  S to produce data in the form Tr  →  A. The computation of dynamic
trajectory attributes uses the mapping T  →  S to produce data in the form
Tr  → (T  →  A). In the derived data, each trajectory is considered as a whole
regardless of its internal structure T → S. Therefore, the derived trajectory attrib-
utes are not added to the position records but form a separate data set.
3.4  Derivation of New Thematic Attributes 81

Derivation of dynamic trajectory attributes can be done after a transformation


of the temporal references in the trajectories. For example, we have transformed
the times in the personal car trajectories considered in Sect. 1.2 to the daily cycle
and computed for each trajectory and 30-min time interval within the daily cycle
the corresponding path length (travelled distance). A time series of path length val-
ues has been attached to each trajectory. The time series are visualized in the time
graph in Fig. 3.4. The horizontal dimension represents the time of the day divided
into 30-min intervals. The positions correspond to the ends of the intervals, that
is, the position labelled 06 corresponds to the interval 05:30–06:00. The vertical
dimension represents the value range of the computed attribute (the path length
is measured in metres). For each trajectory, there is a line in the graph connecting
the vertical positions corresponding to the attribute values in the consecutive time
intervals. The red and blue colouring is used for the trajectories made on workdays
and in the weekend, respectively. In most workday trajectories, travels on rela-
tively short distances (about 7.5 km) occurred in the morning mostly in the inter-
vals ending by 09:30, 10:00, and 10:30 and in the evening in the intervals ending
by 18:30, 19:00, 19:30, and 20:00. In the weekend, most travels occurred between
11:30 and 14:30 and the travel distances were mostly about 3.5 km. There were
also occasional longer trips both on the workdays and in the weekend.
Positional attributes are computed for the positions from which trajectories are
composed or for the segments between two consecutive positions. For simplicity
and uniformity, we assume that positional attributes are always associated with
trajectory positions and, hence, can be attached to the existing position records.
Attributes referring to trajectory segments are associated with the first positions of
the segments and attached to the respective position records.
In the previous chapter, we have listed many positional attributes that can be
computationally derived from trajectory data alone and from a combination of tra-
jectory data and context data. As a reminder, we list the attributes also here:
• Movement attributes:
– instant attributes, which are computed from two consecutive positions:
• instant speed, direction, acceleration (change of speed), turn (change of
direction);

Fig. 3.4  A time graph represents the variation of the travelled distances by 30-min intervals of
a day in the trajectories of the personal car introduced in Sect. 1.2. The lines in red and blue cor-
respond to the workday and weekend trajectories, respectively
82 3  Transformations of Movement Data

– interval attributes, which are computed from the sequence of positions taken from
a fixed length time interval before, after, or around the reference position:
• travelled distance, displacement, movement vector, spatial extent, sinuosity,
tortuosity;
• statistics of instant attributes (average, minimum, maximum, etc.);
– cumulative attributes, which are computed for the time interval from the trajectory
start to the reference position or from the reference position to the trajectory end:
• same as interval attributes;
• temporal distance to the trajectory start or end.
• Attributes expressing relations to the context:
– spatial distance:
• to a selected SCE;
• to the nearest or the nth nearest SCE;
• to the nearest or to the nth nearest STCE within a given temporal window;
– spatial direction:
• in relation to a selected SCE (i.e. the angle between the movement vector
of a mover and the vector directed to the current position of the SCE);
• in relation to the direction of a selected mover (i.e. the angle between the
movement vectors of two movers);
– temporal distance:
• to a selected TCE;
• to the nearest or to the nth nearest TCE;
• to the nearest or to the nth nearest STCE within a given spatial window;
– neighbourhood:
• count of SCE within a given spatial window;
• count of TCE within a given temporal window;
• count of STCE within given spatial and temporal windows.
Besides, thematic attributes of context elements can be attached to trajectory
positions. For example, MoveBank (an online database of animal tracking data,
http://www.movebank.org/) provides a service that enriches position records of trajec-
tories with attributes about the weather conditions in the respective places and times.
Computation of positional attributes transforms trajectory data having originally
the form M  → (T  →  S) to the form M  → (T  →  S  ×  A). When the data already
have the form M → (T → S × A), computation of new positional attributes does not
change this general form but increases the number of components in A.

3.5 Extraction of Spatial Events

3.5.1 Extraction of Movement Events from Trajectories

As discussed in Sect. 2.5, movement of an object m can be viewed as a composi-


tion of spatial events, called movement events and denoted (m, t, s) or (m, t, s, a).
3.5  Extraction of Spatial Events 83

In Sect. 2.8.1, we also argued that occurrences of various relations between a mover
and elements of the spatio-temporal context, denoted as (R, m, c, t1, t2), are spatial
events; we call them relation events. Some movement events and/or relation occur-
rences may be relevant to the goals of analysis. For example, events of long stops
were relevant to our analysis of the individual car movements in Sect. 1.2. Here are
a few other examples of potentially interesting classes of events:
• Attaining particular values of movement attributes, for example, cars exceeding
speed limits;
• Visits of particular places or types of places, for example, wild animals coming
to water sources;
• Meetings of two or more movers;
• Concentrations of many movers in one place.
Analysts need to be able to extract relevant events from the available data,
including movement data and context data. Movement events (m, t, s, a) can be
extracted from trajectory data by means of queries in which the user sets con-
straints on the components s and/or a, that is, on the spatial positions and/or values
of positional thematic attributes. A constraint may be a particular value or subset
of values, for example, specific point or area in space or an interval of speed val-
ues. This may also be a particular relation to the value of the respective component
in the previous or next time unit, for example, the spatial position is the same (or
the distance is below a threshold) or the speed is greater. A query tool finds such
elementary movement events e = (m, t, s, a) that their s and/or a comply with the
constraints and extracts these events. Sequences of consecutive elementary events
of the same object may be united into composite events. The user may set an addi-
tional constraint on the event duration, as we did in extracting stop events.
Extraction of relevant relation occurrences (R, m, c, t1, t2) implies that the user
specifies the relation type R and the element or subset of the context c. A query
tool needs to find such m, t1, and t2 that R(m, c) = true during the interval [t1, t2].
Depending on the relation type R, different query tools may be needed. Relatively
simple tools such as database queries or interactive filters may be sufficient for
dealing with simple spatial relations to selected locations or static spatial objects
and simple temporal relations to selected time units or events. Complex spatio-
temporal relations, also called “movement patterns” (Dodge et al. 2008), may
require devising specific algorithms. For example, the method suggested by Laube
et al. (2005) finds occurrences of two kinds of spatio-temporal relations between
movers, synchronous movement and “trend setting”, when movements of some
mover are repeated by other movers after a time lag. Gudmundsson et al. (2007)
suggest computationally efficient algorithms for detecting four types of spatio-
temporal relations between movers: flock, leadership, convergence, and encounter.
As stated in Sect. 2.8.1, some relations of movers to the spatio-temporal context
can be represented by dynamic attributes attached to object positions. In this case, rela-
tion occurrences can be extracted in the same ways as movement events (m, t, s, a).
Besides the spatial and temporal positions, a number of thematic attributes of
the extracted movement events can be automatically generated: duration, spatial
84 3  Transformations of Movement Data

extent, average speed and direction of the movement, and statistical aggregates
(average, minimum, maximum, median, etc.) of various dynamic attributes, which
may be selected by the user. The extracted events and their attributes form a new
data set with the data structure M → T × S × A (spatial event data).

3.5.2 Detection of Stop Events

Stop events have received particular attention in the literature dealing with move-
ment data. Stop events are highly important in many applications. Thus, stops in
movements of people are usually related to various activities of the people and are
therefore relevant in studying human behaviours. Recognizing the high importance
of stops, Spaccapietra et al. (2008) suggest a conceptual model of trajectories as
sequences of stops and moves between them. In this model, stops are considered
as important parts of trajectories associated with domain-specific semantics, while
moves are merely transitions between consecutive stops.
There is no unique method for identifying stops in trajectories. Depending
on the way of data collection and characteristics of the movement, different
approaches may be required. For example, in collecting the personal car data
we considered in Chap. 1, the GPS device was switched off each time when the
car stopped and the owner left it for a while. Hence, the stops are signified by
relatively long time gaps between the position records. The stop points can be
extracted from the trajectories using a query with a constraint on the temporal dis-
tance from a point to the next point.
However, very often the data collection device continues measuring and record-
ing object’s position also when the object does not move. It may seem that the stop
points in the resulting trajectories are easily detectable based on the speed values:
when the speed in a point is zero, this is a stop. To extract stops of a certain mini-
mal duration, one just needs to find sequences of points with zero speeds such that
the temporal distance between the first and last points is not less than the thresh-
old. However, this can work only in the case of absolute accuracy of the position
measurement, which never happens in practice. Due to unavoidable measurement
errors, consecutively measured positions of a stationary object are not the same,
and, hence, the speed values will differ from zero. It may be not easy to detect
such pseudo-movement in a trajectory and separate it from true movement.
Yan (2009) uses a sophisticated rule for detecting stops in trajectories of
vehicles based on the instant speed values in the trajectory points. A point is
considered as a stop point when each of the following three conditions holds:
(1) the instant speed (velocity) v  ≤ 0.4 × the average speed of the vehicle;
(2) v  ≤ 0.3 × the average speed at the nearest road crossing; (3) v  ≤ 0.3 × the
average speed on the road segment on which the point is located. However, it may
be easily guessed that this approach and these thresholds may work well for some
data sets but not for others. Laube and Purves (2011) detect stops based on dis-
tances between trajectory points: a point is considered as a stop point if its average
3.5  Extraction of Spatial Events 85

distance to other points inside a time window of a chosen length is less than a
chosen threshold. The choices of the window length and the distance threshold are
application dependent. In particular, the time window length is the minimal dura-
tion of staying in about the same place to be considered as a significant stop.
There is a similar approach based on computing the size of the bounding rec-
tangle enclosing a subsequence of points fitting within a time window. The size
of the bounding rectangle can be represented by the length of its diagonal. If the
length is below a chosen threshold, the subsequence is treated as a stop. As an
example, Fig. 3.5 shows the stop events extracted from trajectories of a person
using the following parameters: time window length = 30 min (i.e. the stop dura-
tion is at least 30 min) and bounding rectangle diagonal length = 100 m.
Orellana and Wachowicz (2011) argue that the stop detection methods based
on the speed or displacement may not work for trajectories of pedestrians, where
the movement is slow and the position inaccuracy is so significant that adequate
thresholds can hardly be selected. They suggest an approach that uses one of
the indexes of local spatial autocorrelation in geographical data. This approach
is quite demanding in terms of computational resources and time. Besides, it
assumes that there are multiple trajectory points for each stop, which is not always
the case.

Fig. 3.5  Stops (blue dots) extracted from trajectories of a person (red lines) are represented on a
map (a; b shows an enlarged fragment) and in a space–time cube (c)
86 3  Transformations of Movement Data

This discussion demonstrates that it is not always easy to detect stops (and
other types of events) in movement data. Generic approaches are not always effec-
tive, and some data sets may require devising specific methods involving complex
computations.

3.5.3 Extraction of Spatial Events from Other Data Types

As explained in Sect. 2.5, spatial events relevant to movement analysis can be


extracted not only from trajectories characterizing moving objects but also from
presence dynamics characterizing locations and from spatial situations character-
izing time units. Examples of events occurring in presence dynamics are abrupt
increases or decreases of object presence in a location or appearance of a particu-
lar object or object group. Examples of events occurring in spatial situations are
concentrations of objects in particular locations or massive movements in a partic-
ular direction or to/from a particular location. Such events can be detected by ana-
lysing presence dynamics or spatial situations and extracted for further analysis.
The methods for doing this will be presented in Chaps. 7 and 8, which deal with
characteristics of locations and time units, respectively.
Furthermore, new spatial events can be extracted from available spatial events.
One kind of spatial event that involves other spatial events is occurrence of some
relation between two and more spatial events. Examples of such relations are spa-
tio-temporal proximity between two events, spatio-temporal concentration (clus-
tering) of multiple events, temporal sequence of events. Spatial events involving
other spatial events are called composite, other events are called elementary (see
Sect. 2.5). In particular, a spatio-temporal cluster of spatial events is a composite
spatial event. Chapter 6 describes a method for detection of spatio-temporal clus-
ters of spatial events.
Extraction of spatial events from trajectories, presence dynamics, spatial situa-
tions, and spatial events is a way to focus the analysis on data subsets and features
that are relevant to a current task. Event extraction reduces the data volume and
complexity and enables the application of a large number of existing visual and
computational methods intended for spatial events.
Extraction of spatial events from any type of spatio-temporal data produces
spatial event data in the form E → S × T × A.

3.6 Spatial and Temporal Generalization

Movement can be analysed at different spatial scales. Thus, the goal of analy-
sis may be to understand how people move between cities while their movement
within the cities may be irrelevant, or the other way around. The spatial scale of
the analysis is reflected in the sizes of the spatial units (locations) the analyst deals
3.6  Spatial and Temporal Generalization 87

with. The available movement data do not always match the required spatial scale
of the movement analysis. Positions in movement data are most often specified by
coordinates. This may be inappropriate for studying large-scale movements such
as intercity or inter-region. Similar considerations refer to the temporal compo-
nent. Thus, fine movements occurring each second or minute may be irrelevant
and only the position changes from one day to another may be of interest.
When the spatial and/or temporal scale of the available movement data is lower
than needed for the analysis, it is necessary to generalize the data, that is, trans-
form them to a form where the time units and locations have appropriate sizes.
Temporal generalization is done by dividing the time into intervals of suitable
lengths, which are taken as the new time units. All original time references fitting
in the same interval are treated as being the same. An ambiguity arises when a
mover visited two or more locations during a time interval taken as one of the new
units. Depending on the nature of the data and the goals of the analysis, possible
approaches to handle this problem may be to take the average, or the first, or the
last position of the mover.
Spatial generalization is done by dividing the space into suitable compartments,
which are taken as the new locations. All original locations contained in the same
compartment are treated as being the same. Figure 3.6 gives an illustration. The
territory under study has been divided into compartments to be used as locations in
the further analysis. The locations (points) present in the original trajectory, which
is shown on the left, have been replaced by the compartments. The trajectory is
now represented as a sequence of moves between the compartments (Fig. 3.6
right).
When spatial and/or temporal generalization is applied to movement data in the
form of position records M × T → S or trajectories M → (T → S), the resulting
data structure will be the same as the original one; however, the sets S and/or T
change. The new sets are composed from larger spatial and/or temporal units.
Spatial and spatio-temporal generalization can serve as a tool for anonymization
of movement data and protection of personal privacy. Interested readers are referred
to the papers by Andrienko et al. (2009) and Monreale et al. (2010). The first one is
a short paper presenting the general idea and demonstrating that trajectory generali-
zation does not destroy important general patterns existing in the data. In particular,

Fig. 3.6  Spatial generalization of a trajectory. Left the original trajectory. Right the trajectory
positions have been generalized to areas (compartments of a territory division)
88 3  Transformations of Movement Data

frequently followed routes can be discovered by clustering generalized trajectories.


The second paper presents fully and formally a generalization-based method for
movement data anonymization.

3.7 Trajectory Abstraction (Simplification)

The aim of data abstraction is to remove unnecessary information and retain only
information that is relevant for a particular application or for a particular analysis
task. By data abstraction, several goals are achieved:
• The size of the data is reduced, which also reduces the demand for resources
(RAM size, computational power, time for computation and rendering, time for
reaction to user’s actions, etc.).
• The display clutter can be reduced, so that visualizations become clearer and
more effective. Hence, the analyst’s effort needed for display interpretation is
also reduced.
• Since irrelevant information is removed or reduced, the analyst can more easily
focus on what is important and interesting.
One possible way of data abstraction is to select such a subset (sample) of the
data that has the same important properties or contains the same important infor-
mation as the whole set. To our knowledge, there are no special methods for sam-
pling trajectory data. The suitability of generic random sampling depends on the
goals of analysis, or, in other words, what information is deemed important. Thus,
a random sample of trajectories may allow detection of frequent routes or fre-
quently visited places but may not allow detection of traffic congestions or interac-
tions between moving objects.
The other possible way of abstracting trajectory data is to transform the trajec-
tories by removing unnecessary positions and segments, which means that the tra-
jectories are simplified. As can be seen in Fig. 3.6, spatial generalization simplifies
trajectories. In fact, the terms “generalization” and “abstraction” are often used as
synonyms. In the context of this book, we would like to distinguish between them.
Spatial, temporal, and spatio-temporal generalizations, as they defined in the pre-
vious section, are possible methods of abstraction, which is a broader term.
The possible approaches to movement data abstraction by trajectory simplifica-
tion include the following:
• geometric simplification,
• density-based simplification,
• place-based simplification,
• event-based simplification,
• attribute-based simplification.
The main goal of geometric simplification is to convey the shape of the trajec-
tory using a minimum number of points. This can be achieved using the popular
3.7  Trajectory Abstraction (Simplification) 89

Douglas–Peucker algorithm for line simplification (Douglas and Peucker 1973).


Geometric simplification can be recommended when the shapes of trajectories are
of interest and not the visited locations and the thematic attributes related to trajec-
tory positions and segments.
Density-based simplification replaces spatial concentrations of consecutive tra-
jectory points by a single point. It may be the centre of the circle enclosing the
original points, the point with the average coordinates, or the medoid of the group
of points, that is, the point with the minimal average distance to all other points.
Density-based simplification preserves the geometric shape of the trajectory and
the visited locations. Values of thematic attributes related to the trajectory posi-
tions and segments need to be averaged, which may lead to substantial information
losses. Hence, density-based simplification should be avoided when position-
related thematic attributes are of interest.
Place-based simplification may be used when the analysis focuses on visits of
certain places of interest (POIs) and/or relations between them. In this case, the
trajectory positions lying outside of the POIs can be omitted. Moreover, when
POIs are defined as areas, different positions lying within the same POI may be
replaced by one representative position, for example, by the centre of this POI. For
example, in one of the steps of our investigation of the traffic in Milan in Sect. 1.3,
we abstracted the car trajectories to moves between the highway crossings and the
city centre and disregarded the remaining spatial positions. Place-based simplifica-
tion may distort the shapes of the trajectories. The values of position-related the-
matic attributes are preserved only for the trajectory points within the POIs.
Event-based simplification preserves the trajectory positions in which certain
movement events of interest occurred and omits the remaining positions. In many
applications, stop events are highly important. Thus, stops in movements of peo-
ple are usually related to various activities of the people. When the analysis is
focused on these activities, in particular, on their locations in space and time, the
trajectories can be reduced to the positions of the stops. This corresponds to the
view of trajectories as sequences of stops and moves between them suggested by
Spaccapietra et al. (2008). Theoretically, simplification can also be based on other
types of movement events.
An example of stop-based simplification of a trajectory is shown in Fig. 3.7,
where the original trajectory is in red and the simplified version of it in blue. The
stop events have been identified as described in Sect. 3.5 (see Fig. 3.5).
Attribute-based simplification finds sequences of three or more trajectory
points in which values of certain thematic attributes remain constant or nearly
constant and shortens the sequences, for example, by keeping only the first and
the last points of each sequence and omitting the rest. The average values of the
thematic attributes from the original points are attached to the points that remain
after the simplification. Since the attribute values are close to each other, the
information loss caused by the averaging is low. Hence, this way of simplifying
excessively detailed trajectories may be recommended when it is necessary to
analyse dynamic thematic attributes within the trajectories. However, the informa-
tion loss is low only for the attributes that have been taken into account in the
90 3  Transformations of Movement Data

Fig. 3.7  Event-based (stop-based) simplification of a trajectory. The map (left) and space–time


cube (right) show the original trajectory in red and the result of stop-based simplification in blue

simplification. The values of other attributes may significantly vary within a point
sequence subject to simplification; hence, the average values may be completely
useless and misleading. Therefore, all dynamic attributes whose values need to be
analysed after simplifying trajectories must be taken into account in the course of
the simplification. However, simultaneous use of many attributes may lead to a
situation when the simplification is impossible because there is no constancy of
attribute value combinations in sequences of consecutive trajectory points.
The shapes of the simplified trajectories will be close to the original shapes if
the attribute “movement direction” is included in the set of thematic attributes on
which the simplification is based. In this case, a sequence of points will be simpli-
fied only if the movement direction is nearly constant. Hence, the points of signifi-
cant turns will be preserved.
Trajectory simplification does not change the structure of the original move-
ment data but only reduces the number of data records.

3.8 Spatio-Temporal Aggregation

Aggregation is an instrument for dealing with large amounts of data, when it is


unfeasible to investigate them in full detail. Aggregation is also a way to distil
general features out of fine-detail “noise”. In particular, aggregation of movement
data enables an overall view of the spatial and temporal distribution of multiple
movements, which is hard to gain from displays showing individual trajectories.
An illustrated survey of the aggregation methods used for movement data and the
3.8  Spatio-Temporal Aggregation 91

visualization techniques applicable to the results of the aggregation is given by


Andrienko and Andrienko (2010).
Spatial and spatio-temporal aggregation of movement data may be continuous
or discrete. Continuous spatial aggregation generates smooth surfaces, or fields,
representing movement density (Dykes and Mountain 2003; Willems et al. 2009).
Willems et al. have developed a specific kernel density estimation method for tra-
jectories, which involves interpolation between consecutive trajectory points tak-
ing into account the speed and acceleration. Density fields built using kernels with
different radii can be combined into one field to expose simultaneously large-scale
patterns and fine features (Fig. 3.8). Density fields are visualized on a map using
colour coding and/or shading by means of an illumination model.
To investigate the variation of the movement density over time, continuous
spatial aggregation is combined with temporal aggregation, which can also be
continuous or discrete. Demšar and Virrantaus (2010) extend the idea of spatial
density to spatio-temporal density: They aggregate trajectories into density vol-
umes in three-dimensional space–time continuum by generalizing the standard
2D kernel density around 2D point data into 3D density around 3D polyline data.
The resulting volumes are represented in a space–time cube. For discrete tempo-
ral aggregation, time is divided into intervals. Depending on the application and
analysis goals, the analyst may consider time as a line (i.e. linearly ordered set
of moments) or as a cycle, for example, daily, weekly, or yearly. Accordingly,
the time intervals for the aggregation are defined on the line or within the chosen
cycle. The combination of discrete temporal aggregation with continuous spatial
aggregation gives a sequence of density surfaces, one per each time interval. Such
a sequence can be visualized on animated density maps.

Fig. 3.8  Continuous spatial aggregation of vessel trajectories using a kernel density estimation


method developed by Willems et al. (2009). Image courtesy of Niels Willems
92 3  Transformations of Movement Data

Discrete spatial aggregation uses a finite set of places, such as units of territory
division or previously defined areas of interest. When there are no predefined areas,
a set of places for the aggregation can be obtained by means of spatial tessella-
tion, that is, dividing the space into compartments. Often spatial data are aggre-
gated using regular grids. Thus, Dykes and Mountain (2003) and Mountain (2005)
counting trajectory points fitting in each cell of a regular grid and represent the
resulting density counts by colouring or shading of the grid cells on a map display.
The densities can also be computed for consecutive time intervals and presented on
an animated map display. Similar to densities, other aggregated characteristics can
be computed and visualized. Thus, Forer and Huisman (2000) compute the total
number of person/minutes spent in each grid cell. Brillinger et al. (2004) compute
the prevailing movement directions in grid cells and represent them visually by
arrows, which may also differ in length and width depending on the average move-
ment speed and the number of moving objects.
Space division by a regular grid does not respect the spatial distribution of the
data. It is more appropriate to define spatial compartments, so that they enclose
existing clusters of trajectory points. However, these clusters may have very dif-
ferent sizes and shapes, which has two disadvantages. First, it is computationally
hard to automatically divide a territory into arbitrarily shaped areas enclosing clus-
ters. Second, such areas are likely to differ in their size, and the respective aggre-
gates would be hard to compare to each other. Therefore, we suggest a method that
divides a territory into convex polygons of approximately equal size based on the
point distribution (Andrienko and Andrienko 2011).
The method first extracts characteristic points from the trajectories: start and
end positions, positions of significant turns and stops (the minimum turn angle and
stop duration are specified by the user as parameters), and representative points
from long straight segments (the user specifies the maximal allowed distance
between extracted points). Then, the method finds spatial clusters of characteristic
points that can be enclosed by circles with a user-chosen radius. A concentration
of points having a larger size and/or complex shape will be divided into several
clusters. The centroids (average points) or medoids of the clusters are then used as
generating points for Voronoi tessellation (Okabe et al. 2000). The medoids are the
points with the minimal average distance to the cluster members. They are usually
located inside concentrations of points. This method of territory division is illus-
trated in Fig. 3.9a and b. It has also been used for the aggregation of the personal
car and Milan traffic data in Chap. 1.
A convenient property of this method is that the cluster radius chosen by the user
determines the sizes of the resulting territory compartments and, hence, the level
of aggregation and the spatial scale of the subsequent analysis. Hence, the user can
choose a suitable spatial scale depending on the size of the territory and the analysis
goals, and also do the analysis at different spatial scales.
On the basis of the previously defined set of places P, each trajectory is repre-
sented by a sequence of visits v1, v2, …, vn of places from P. A visit vi is a tuple
<mk, pi, tstart, tend>, where mk is the moving object, pi  ∈ P is a place, tstart is the
starting time of the visit, and tend is the ending time. Complementarily to this, each
3.8  Spatio-Temporal Aggregation 93

Fig. 3.9  Discrete spatial aggregation of vessel trajectories using an irregular space division pro-
duced according to the spatial distribution of characteristic points from the trajectories. a The
original trajectories are represented by lines drawn with 10 % opacity. b The territory has been
divided into Voronoi polygons (demarcated by grey boundary lines) built around the centres (yel-
low dots) of the spatial clusters of characteristic points (orange circles, drawn with 30 % opacity)
extracted from the trajectories. c Visualization of place-related aggregate attributes: the counts of
place visits are represented by the sizes of the circles and the mean times spent in the places by
the background shading; darker means longer. d The counts of the moves between the places are
represented by the widths of the arrow symbols. For a better legibility, the symbols representing
less than 20 moves are hidden

trajectory is also represented by a sequence of moves m1, m2, …, mn−1, where a


move mi is a tuple <mk, pi, pi+1, t0, tfin> describing the transition from place pi to
place pi+1. Here, t0 is the time moment when the move begins (it equals tend of visit
vi of place pi) and tfin is the time moment when the move finishes (it equals tstart of
visit vi+1 of place pi+1).
The difference between quasi-continuous and episodic movement data
has an impact on the properties of the moves representing a trajectory. In the
94 3  Transformations of Movement Data

representation of a quasi-continuous trajectory, the places pi and pi+1 are, as a rule,


neighbours in space. If they are not, it is possible to apply interpolation and intro-
duce intermediate places between pi and pi+1 so that any two consecutive places
are neighbours. In the representation of an episodic trajectory, consecutively vis-
ited places pi and pi+1 are not necessarily neighbours in space. Since episodic
movement data do not allow valid interpolation between known positions, it is
not possible to introduce intermediate places between pi and pi+1. Hence, it may
be necessary to deal with moves whose origin and destination places are quite far
away from each other.
Having a dual representation of each trajectory, as a sequence of visits and as
a sequence of moves, the data can be aggregated in two complementary ways.
First, for each place pi and time interval Δt, the set of visits V(pi, Δt) is extracted
and the counts of the visits NV(pi, Δt) and different visitors NVO(pi, Δt) are
computed:
V(pi , ∆t) = {< mk , pi , tstart , tend > | ∃t : tstart ≤ t ≤ tend and t ∈ ∆t}
NV(pi , ∆t) = |V(pi , ∆t)|
NVO(pi , ∆t) = |{mk | ∃ < mk , pi , tstart , tend >∈ V(pi , ∆t)}|

Notice that an object mk may visit more than one place during the interval Δt.
It will be counted in each of the visited places.
If the original data records include additional attributes, various statistics of
these attributes can also be computed, such as minimum, maximum, average,
median. Hence, each place is characterized by two or more time series of aggre-
gate values: counts of visits NV, counts of visitors NVO, and, possibly, additional
statistics by the time intervals.
The second way of aggregation is applied to connections (links) between
places, that is, ordered pairs of places <pi, pj> such that there is at least one move
from pi to pj. For each connection <pi, pj> and time interval Δt, the set of moves
from pi to pj is extracted:
M(pi , pj , ∆t) = {< mk , pi , pj , t0 , tfin > | tfin ∈ ∆t}

Notice that only the moves that finish within the interval Δt are included. The
count of the moves NM(pi, pj, Δt) and the count of different objects that moved
NMO(pi, pj, Δt) are computed:
NM(pi , pj , ∆t) = |M(pi , pj , ∆t)|
NMO(pi , pj , ∆t) = |{mk | < ∃mk , pi , pj , t0 , tfin > ∈ M(pi , pj , ∆t)}|

An object mk may move through more than one link during the interval Δt. It
will be counted for each of the links it passed.
If the original data include additional attributes, it is also possible to compute
changes of the attribute values from t0 to tfin, for example, as differences or ratios
between the values at tfin and t0, and then aggregate the changes by computing
various statistics. Hence, each link is characterized by two or more time series
3.8  Spatio-Temporal Aggregation 95

of aggregate values: counts of moves NM, counts of moving objects NMO, and,
possibly, additional statistics of attribute changes by the time intervals.
In computing the counts and other statistics as described above, it is possible to
use a single time interval Δt that covers the whole time span of the data. In this
case, the data are aggregated spatially irrespectively of time.
The two ways of discrete spatial and spatio-temporal aggregation, by places
and by connections, support two classes of analysis tasks focusing on space:
• Investigation of the place characteristics in terms of the presence of moving
objects in different places and the temporal variation of the presence. The pres-
ence is expressed by the counts of visits and visitors in the places, that is, NV
and NVO, which will be jointly referred to as presence counts.
• Investigation of the relations between the places in terms of the flows (aggregate
movements) of objects between different places and the temporal variation of
the flows. The flows are represented by the counts of moves and moving objects
for the connections, that is, NM and NMO. These aggregate attributes are often
referred to as flow magnitudes.
The aggregated data are, by their form, spatial time series. The presence counts
refer directly to places, that is, spatial locations, and the flow magnitudes refer to
static spatial objects, namely links between places; hence, they indirectly refer to
spatial locations. As explained in Sect. 2.4, spatial time series can be viewed in
two ways, as a collection of local time series of attribute values in different loca-
tions and as a sequence of spatial distributions of attribute values in different time
units. Hence, the aggregated movement data can be viewed as local time series
associated with the places S → (T → A) and with the links S × S → (T → A) and
as spatial distributions of the object presence or flows over the whole territory
during a time interval:
SSP(∆t) = {NV(pi , ∆t) | pi ∈ P} or SSP(∆t) = {NVO(pi , ∆t) | pi ∈ P};
SSF(∆t) = {NM(pi , pj , ∆t) | pi ∈ P, pj ∈ P} or
SSF(∆t) = {NMO(pi , pj , ∆t) | pi ∈ P, pj ∈ P}.

Here, SSP(Δt) denotes a spatial distribution of object presence, which will be


further called presence distribution, and SSF(Δt) stands for a spatial distribution of
flows, further referred to as flow distribution. The presence and flow distributions can
be represented in a general way by formulas S → A and S × S → A, respectively. The
time series (temporal sequences) of the presence and flow distributions can be repre-
sented by the formulas T → (S → A) and T → (S × S → A), respectively.
Figure  3.9c shows a possible visualization of a presence distribution and the
flow map in Fig. 3.9d represents a flow distribution. Other examples of flow maps
and presence maps have been given in Chap. 1. Movement density maps, as in
Fig. 3.8, also represent presence distributions. Besides discrete flow maps, which
appear in our book, flow distributions can be shown in continuous flow maps
(Tobler 1981), where place-to-place flows are transformed into vector fields and
represented by vector symbols or streamlines.
96 3  Transformations of Movement Data

In a flow map resulting from aggregation of episodic movement data, there


may be many intersections among the flow symbols, which clutter the display. An
example is shown in Fig. 3.10. For this map, we have aggregated trajectories of
Flickr users (the data have been introduced in Sect. 2.10.6). As may be seen, there
are long flow symbols that cover or intersect shorter arrows. The intersections and
overlaps are caused by the discontinuity of the trajectories, where consecutive
recorded positions (in this example, positions of the photographs) may be distant
in space. In fact, there are much more intersections and overlaps than can be seen
in Fig. 3.10. We have filtered out the flows with the magnitudes below 20 moves
since the original map was absolutely illegible. It should be admitted that even a
flow map built from quasi-continuous movement data, where the flow symbols
connect only neighbouring places, may also be rather cluttered. Thus, in Fig. 3.9d,
we have applied a similar filter as in Fig. 3.10.
In aggregation, it is essential to be aware about the modifiable unit problem: the
analysis results may depend on how the original units are aggregated (geographi-
cal sciences use the term “modifiable areal unit”) (Openshaw 1984). This refers
not only to the sizes of the aggregates (scale effects) but also to their locations and
composition from the smaller units (the delineation of the spatial compartments or
the origins of the time intervals). Therefore, it is always advisable to test the sensi-
tivity of any findings to the way of aggregation.

Fig. 3.10  A flow map representing aggregated episodic movement data, namely trajectories of
Flickr users in Switzerland and surrounding areas. The widths of the arrow symbols are propor-
tional to the counts of the moves between the places. For a better legibility, the symbols repre-
senting less than 20 moves are hidden
3.8  Spatio-Temporal Aggregation 97

Spatial and temporal aggregation may be combined with attributive


aggregation, which is done in the following way: The value domain of an attrib-
ute is divided into subsets; in particular, for a numeric attribute, the value range is
divided into intervals. For each subset, statistics about the objects that have attrib-
ute values from this subset are computed. The existing methods for spatial, tempo-
ral, and attributive aggregation of movement data are discussed by Andrienko and
Andrienko (2010).

3.9 Transformations Between Data Types

Some of the transformations described in this chapter convert movement data in the
form of position records or trajectories to other basic types of spatio-temporal data,
as defined in Sect. 2.4, namely spatial event data and spatial time series referring
to spatial objects (movers or trajectories), places, and links between places. Spatial
event data result from event extraction (Sect. 3.5). Spatial time series referring to
movers or trajectories may result from derivation of new attributes (Sect. 3.4) and
spatial time series referring to places and connections between places result from
spatio-temporal aggregation (Sect. 3.8).
Not only trajectory data can be transformed to other data types, but also other
types of spatio-temporal data can sometimes be transformed to trajectory data. In
particular, episodic movement data are often constructed from spatial event data.
For example, georeferenced photographs in Flickr represent spatial events of tak-
ing photographs by Flickr users. The spatial positions and time stamps of the pho-
tographs of one user define a trajectory of this user. Likewise, trajectories can be
built from data describing mobile phone calls, georeferenced posts in social net-
works, appearances at sensors, and other events when one object participates in
multiple events.
Construction of trajectories from spatial time series data is not a very usual but
still imaginable data transformation. For example, there may be data about the
atmospheric pressure in different locations for a sequence of time units. By con-
necting the positions of the minimal or maximal value for the consecutive time
units, one may construct a trajectory of the low (high) pressure region.
There are also transformations that convert spatial event data to spatial time series
and vice versa. Such transformations may also be useful in analysing movement
data, in particular, in combination with context data. Andrienko et al. (2011) pre-
sent an analytical procedure where movement events extracted from trajectories are
first used for identifying significant places and then aggregated by these places and
time intervals producing place-related time series. Andrienko et al. (2012) describe
extraction of spatial events from time series of numeric values referring to different
locations in space. Spatial events are constructed from peaks or pits detected in the
time series by means of a special algorithm. The spatial positions of the events are
the locations described by the time series from which they have been extracted and
the temporal positions are the times when the peaks or pits occurred.
98 3  Transformations of Movement Data

Fig. 3.11  Transformations Trajectories
between different types of
spatio-temporal data Extraction Aggregation

Integration Extraction
Aggregation
Spatial events Spatial time series
Extraction

Hence, the three basic types of spatio-temporal data are linked by a set of trans-
formation methods that convert one data type to another, as graphically summa-
rized in Fig. 3.11. We remind that spatial time series can be viewed in two ways,
as a set of local time series referring to different locations and as a sequence of
spatial distributions referring to different time units (Sect. 2.4).

3.10 Recap

The methods for data transformation introduced in this chapter are meant to pre-
pare available movement data to further analysis. The need for data transformation
may come from
• the methods we want to apply (e.g. some methods may require re-sampling of
trajectories),
• the task focus (e.g. spatial aggregation is appropriate for space-focused tasks),
• the task target (e.g. the target attribute may not be originally available and needs
to be computed),
• the size of the data (e.g. simplification of excessively detailed data reduces the
resource demands and display clutter),
• the desired spatial and temporal scale of analysis (e.g. generalization is applied
when fine details are not of interest),
• characteristics of the movers and their movements (e.g. people do not move all
the time but make trips; hence, division of movement tracks into trajectories
corresponding to the trips may be appropriate),
• relation of the movement to temporal cycles (e.g. adjustment of time references to
temporal cycles may be useful in exploring movements of people and animals),
and various application-specific considerations. Some transformations enrich the
original data with new components (derivation of new thematic attributes), some
other produce new data sets of the same structure as the original one (trajectory re-
sampling, division, generalization, and simplification), and some methods produce
new objects: events, place-related static and dynamic attributes, links between
places also described by static and dynamic attributes, presence situations, and
flow situations. Table 3.1 summarizes the data structures produced by the different
transformations.
3.10 Recap 99

Table 3.1  Transformations of movement data O → (T → S)


Transformation Resulting data structure
1 Interpolation and re-sampling O → (T → S)
2 Division of trajectories O → ((T1 → S) ∪ (T2 → S) ∪ · · · ∪ (Tk → S))
3 Transformations of temporal and O → (T → S)
spatial references
4a Derivation of static trajectory (T → S) → A
attributes
4b Derivation of dynamic trajectory (T → S) → T → A
attributes
4c Derivation of positional attributes O → (T → S × A)
5 Extraction of movement events O → T × S × A
6 Spatial and temporal generalization O → (T → S)
7 Simplification of trajectories O → (T → S)
8a Spatial aggregation: presence S → A

– place-related spatial time series


8b Spatio-temporal aggregation: S → (T → A) or T → (S → A)
presence
8c Spatial aggregation: flows S × S → A

– link-related spatial time series


8d Spatio-temporal aggregation: flows S × S → (T → A) or T → (S × S → A)

Events Attributes

5 5

1,2,3,6,7 4a,b,c
Trajectories Attributes

8a,b
8c,d

Attributes 8c,d
Places
Local time series Attributes
Connections
Local time series

8b 8d

Presence distributions Time units Flow distributions

Fig. 3.12  Types of data produced by transformations of trajectories

The diagram in Fig. 3.12 schematically represents types of data that can be


derived from trajectory data by means of the different transformations.
The possibility to transform spatio-temporal data from one form to another
allows the analyst to adapt available data to different types of tasks in movement
analysis (Sect. 2.12). The major classes of tasks correspond to four perspectives
of movement (Sect. 2.6): mover-oriented perspective, event-oriented perspective,
space-oriented perspective, and time-oriented perspective. These perspectives are
100 3  Transformations of Movement Data

Movers
Trajectories

Locations
Movement data Local time series
Spatial events
Spatial event data Spatial time series
Times
Spatial distributions

Fig. 3.13  Different forms of movement data and their correspondence to types of movement


analysis tasks

supported by four possible forms of movement data: trajectory data describing


trajectories and thematic attributes of movers, spatial event data describing spa-
tio-temporal positions and thematic attributes of spatial events, local time series
describing the presence dynamics in locations and thematic attributes of the loca-
tions, and spatial distributions describing spatial situations in time units. The latter
two forms are two complementary views of the same data structure called spatial
time series. The task foci, corresponding forms of movement data, and possible
transformations between them are graphically summarized in Fig. 3.13.
Furthermore, the analyst often needs to deal not only with movement data but
also with context data, which also may have different types and forms. The follow-
ing chapter presents the basic visualization and interaction techniques supporting
exploration of different types of spatio-temporal data.

References

Andrienko, G., & Andrienko, N. (2010). A general framework for using aggregation in visual
exploration of movement data. The Cartographic Journal, 47(1), 22–40.
Andrienko, N., & Andrienko, G. (2011). Spatial generalization and aggregation of massive move-
ment data. IEEE Transactions on Visualization and Computer Graphics, 17(2), 205–219.
Andrienko, G., Andrienko, N., Giannotti, F., Monreale, A., & Pedreschi, D. (2009, November 3).
Movement data anonymity through generalization. In Proceeding 2nd SIGSPATIAL ACM
GIS 2009 international workshop on security and privacy in GIS and LBS (SPRINGL 2009),
Seattle, WA, USA. http://doi.acm.org/10.1145/1667502.1667510.
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., & Wrobel, S. (2011). From movement
tracks through events to places: extracting and characterizing significant places from mobility
data. In Proceedings of the IEEE visual analytics science and technology (VAST 2011) (pp.
161–170). IEEE Computer Society Press.
Andrienko, G., Andrienko, N., Mladenov, M., Mock, M., & Pölitz, C. (2012). Identifying
place histories from activity traces with an eye to parameter impact. IEEE Transactions on
Visualization and Computer Graphics (TVCG), 18(5), 675–688.
Brillinger, D. R., Preisler, H. K., Ager, A. A., & Kie, J. G. (2004). An exploratory data analysis (EDA)
of the paths of moving animals. Journal of statistical planning and inference, 122(2), 43–63.
Demšar, U., & Virrantaus, K. (2010). Space–time density of trajectories: Exploring spatio-tem-
poral patterns in movement data. International Journal of Geographical Information Science,
24(10), 1527–1542.
References 101

Dodge, S., Weibel, R., & Lautenschütz, A.-K. (2008). Towards a taxonomy of movement pat-
terns. Information Visualization, 7(3–4), 240–252.
Douglas, D., & Peucker, T. (1973). Algorithms for the reduction of the number of points required
to represent a digitized line or its caricature. Cartographica: The International Journal for
Geographic Information and Geovisualization, 10(2), 112–122.
Dykes, J. A., & Mountain, D. M. (2003). Seeking structure in records of spatio-temporal behav-
iour: Visualization issues, efforts and applications. Computational Statistics & Data Analysis,
43, 581–603.
Forer, P., & Huisman, O. (2000). Space, time and sequencing: Substitution at the physical/virtual
interface. In D. G. Janelle & D. C. Hodge (Eds.), Information, place and cyberspace: Issues
in accessibility (pp. 73–90). Berlin: Springer.
Gudmundsson, J., van Kreveld, M., & Speckmann, B. (2007). Efficient detection of patterns in
2D trajectories of moving points. Geoinformatica, 11(2), 195–215.
Kwan, M. P. (2000). Interactive geovisualization of activity-travel patterns using three-dimen-
sional geographical information systems: A methodological exploration with a large data set.
Transportation Research Part C, 8, 185–203.
Laube, P., & Purves, R. S. (2011). How fast is a cow? Cross-scale analysis of movement data.
Transactions in GIS, 15, 401–418.
Laube, P., Imfeld, S., & Weibel, R. (2005). Discovering relative motion patterns in groups of mov-
ing point objects. International Journal of Geographical Information Science, 19(6), 639–668.
Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., & Huang, Y. (2009). Map-matching for low-
sampling-rate GPS trajectories. In Proceedings ACM SIGSPATIAL international conference
on advances in geographic information systems (ACM SIGSPATIAL GIS) (pp. 544–545).
Macedo, J., Vangenot, C., Othman, W., Pelekis, N., Frentzos, E., Kuijpers, B., et al. (2008).
Trajectory data models. In F. Giannotti & D. Pedreschi (Eds.), Mobility, data mining and pri-
vacy—geographic knowledge discovery (pp. 123–150). Berlin: Springer.
Monreale, A., Andrienko, G., Andrienko, N., Giannotti, F., Pedreschi, D., Rinzivillo, S., et al.
(2010). Movement data anonymity through generalization. Transactions on Data Privacy,
3(3), 91–121.
Mountain, D. M. (2005). Visualizing, querying and summarizing individual spatio-temporal
behavior. In J. A. Dykes, M.-J. Kraak, & A. M. MacEachren (Eds.), Exploring geovisualiza-
tion (pp. 181–200). London: Elsevier.
Okabe, A., Boots, B., Sugihara, K., & Chiu, S. N. (2000). Spatial tessellations—concepts and
applications of Voronoi diagrams (2nd ed.). Chichester: Wiley.
Openshaw, S. (1984). The modifiable areal unit problem. Norwich: Geo Books.
Orellana, D., & Wachowicz, M. (2011). Exploring patterns of movement suspension in pedes-
trian mobility. Geographical Analysis, 43(3), 241–260.
Quddus, M. A., Ochieng, W. Y., & Noland, R. B. (2007). Current map-matching algorithms
for transport applications: State-of-the art and future research directions. Transportation
Research Part C: Emerging Technologies, 15(5), 312–328.
Spaccapietra, S., Parent, C., Damiani, M. L., de Macedo, J. A., Porto, F., & Vangenot, C. (2008).
A conceptual view on trajectories. Data & Knowledge Engineering, 65(1), 126–146.
Tobler, W. (1981). A model of geographic movement. Geographical Analysis, 13(1), 1–20.
Tufte, E. R. (1983). The visual display of quantitative information. Cheshire: Graphic press.
Willems, N., van de Wetering, H., & van Wijk, J. J. (2009). Visualization of vessel movements.
Computer Graphics Forum (CGF), 28(3), 959–966.
Yan, Z. (2009). Towards semantic trajectory data analysis: A conceptual and computational approach.
In Proceedings VLDB 2009 PhD workshop, http://www.vldb.org/pvldb/2/vldb09-991.pdf.
Yuan, J., Zheng, Y., Zhang, C., Xing Xie, X., & Sun, G.-Z. (2010). An interactive-voting based
map matching algorithm. In: Proceedings IEEE international conference on mobile data
management (pp. 43–52). Los Alamitos, CA, USA: IEEE Computer Society.
Chapter 4
Visual Analytics Infrastructure

Abstract  In this chapter, we describe basic visualization and interaction techniques


that enable viewing and exploration of movement data and other types of spatio-
temporal data and facilitate data transformations and joint analysis of different data
types. Cartographic maps and space–time cubes are universal types of display for
visualizing various kinds of spatio-temporal objects and data, including trajectories
of moving objects, spatial events, aggregate movements (flows), and time series of
attribute values. However, they provide limited opportunities for representing tem-
poral and thematic (attributive) aspects of the data; thus, additional forms of data
display are required. Time graphs and temporal bar charts are useful for representing
the temporal and attributive aspects. Multiple co-existing displays showing different
aspects or components of the data need to be visually linked. This is achieved by
means of consistent visual encodings (e.g. same colours) and simultaneous consist-
ent reaction of different displays to various user interactions, in particular, to data
filtering. Filtering helps the user to reduce display clutter and occlusions, to focus
on relevant parts of the data, to establish relationships between different components
of the data, and to integrate information coming from different displays. Interactive
filtering can be done according to different aspects of the data: spatial, temporal, the-
matic (attributive), or class/group membership. For complex objects, such as trajec-
tories, filtering can be applied to object components (points and segments). Filtering
may also change secondary data that have been derived earlier from the data that are
filtered, such as results of data aggregation. The displays representing the secondary
data can be updated to reflect the changes.

4.1 Interactive Visualizations

The most common type of display used for visualization of various kinds of spatial
and spatio-temporal data is the cartographic map (Vasiliev 1997; Slocum et al. 2009).
Maps can represent the structure of geographical space and properties of different

G. Andrienko et al., Visual Analytics of Movement, 103


DOI: 10.1007/978-3-642-37583-5_4, © Springer-Verlag Berlin Heidelberg 2013
104 4  Visual Analytics Infrastructure

locations, positions and properties of spatial objects, trajectories of movement objects,


and flows between places.
Trajectories of moving objects are typically represented on maps by solid lines,
by segmented lines, where the segment widths and/or colours may encode move-
ment attributes, or by linearly arranged arrow symbols, which may also vary in
their appearance for representing attribute values. Aggregated movements (flows)
are visualized by means of flow maps where flows are represented by straight or
curved lines or arrows connecting locations; the flow magnitudes are represented
by proportional widths, colouring or shading of the symbols, and/or proportional
degree of opacity (Tobler 1981, 1987; Wood et al. 2011). Examples of flow maps
can be seen in this book in Figs. 1.9, 1.10, 1.23, 3.9, and 3.10.
Since lines or arrows in a flow map may connect not only neighbouring loca-
tions but any two locations at any distance, massive intersections and occlusions
of the flow symbols may occur (as in Fig. 3.10), which makes the map illeg-
ible. Several approaches have been suggested for reducing the display clutter.
The simplest are filtering (Tobler 1987) or reducing the opacity of lesser flows
(Wood et al. 2011), but these involve high information loss. Boyandin et al.
(2010) remove the middle parts of the lines connecting the places and colour
the remaining starting and ending segments of the lines in two different colours.
This reduces the clutter, but the flows may be not easy to trace. Approaches
involving edge bundling (Phan et al. 2005; Verbeek et al. 2011; Ersoy et al.
2011) work well only for showing flows from one or two locations or in spe-
cial cases, for example, when radial flows from/to one location prevail over
all others (Ersoy et al. 2011). Besides, edge bundling on a map representing
geographical rather than abstract space introduces undesired geographical arte-
facts: bundled edges make a misleading impression of arterial roads that do not
exist in reality. Tobler (1981) transforms discrete flows into continuous move-
ment fields represented by vectors or streamlines. This allows seeing general
trends in flow directions; however, the links between places are lost. Guo (2009)
deals with the problem of clutter by finding regions consisting of highly inter-
connected locations and aggregating the individual flows between locations into
flows between regions.
Movements in three-dimensional space, for example, in the air or under water,
are harder to visualize than movements on a surface. Ware et al. (2006) represent a
single trajectory of a whale by a three-dimensional ribbon (in a perspective view)
with glyphs on its surface showing the direction of the movement. Hurter et al.
(2009) represent multiple trajectories of aircrafts in horizontal or vertical two-
dimensional projections with animated transitions from one projection to another.
Figures  4.1–4.3 give examples of visual representation of three-dimensional tra-
jectories in a perspective view. Trajectories are represented by tubes, as in Fig. 4.1,
or by ribbons, as in Figs. 4.2 and 4.3. The colouring of the tubes or ribbons can
represent values of a thematic attribute, such as the movement slope in Fig. 4.1
and the speed in the other two figures. Tubes are especially suitable for showing
highly curved paths, such as the path of a paraglider in Fig. 4.1. The spiral patterns
represent the climbing motion, which occurs when the paraglider pilot finds a
thermal lift.
4.1  Interactive Visualizations 105

Fig. 4.1  A trajectory of a paraglider is represented by a tube in a three-dimensional perspective


view. The colouring represents the slopes of the movement: shades of blue are used for positive
slopes, that is, moving up, and red for negative slopes, that is, moving down. Image courtesy of
Katerina Vrotsou and Carlo Navarra, Linköping University, Sweden

Fig. 4.2  A trajectory of an airplane is represented by a ribbon in a three-dimensional perspec-


tive view. The colouring represents the speed. The arrows show the movement directions. Their
colours and shapes represent the vertical and horizontal components of the speed, respectively.
Image courtesy of Katerina Vrotsou and Carlo Navarra, Linköping University, Sweden
106 4  Visual Analytics Infrastructure

Fig. 4.3  Trajectories of multiple airplanes are shown in a three-dimensional perspective view.


Image courtesy of Katerina Vrotsou and Carlo Navarra, Linköping University, Sweden

Ribbons are more suitable for representing paths without extreme curves, such
as trajectories of airplanes. An advantage of using ribbons is that they can be over-
laid by glyphs encoding additional thematic attributes. Thus, in Figs. 4.2 and 4.3,
triangle glyphs show the movement directions and their colours and shapes encode
the vertical and horizontal components of the speed. Movements up and down are
represented by green and orange colours, respectively, while the glyph length is
proportional to the horizontal speed. When exploring a single trajectory, it may be
useful to include in the image a semitransparent “curtain” connecting the three-
dimensional trajectory line with its ground projection (Fig. 4.2). Such a curtain
facilitates the perception of the elevation and geographical positions.
A representation of multiple three-dimensional trajectories may be very much clut-
tered; however, in some cases, it may be quite useful, in particular, when there are many
coherent movements. For example, Fig. 4.3 shows that almost all landing airplanes fol-
low the same path: they first approach the airport from the west but then make a turn
and come from the south. All airplanes that take off fly in the north-western direction.
Unfortunately, maps and three-dimensional spatial displays are weak at rep-
resenting time-variant data. Therefore, spatio-temporal data are often visualized
by means of image sequences, where situations in different time units are repre-
sented in multiple images. This technique may be accomplished in two forms: as
an animated display (a temporal arrangement of individual images) and as “small
multiples” (a spatial arrangement of images, which are shown simultaneously).
A common opinion is that small multiples can better support data exploration and
analysis than animated displays, which do not allow comparisons between differ-
ent time frames. However, small multiples are limited with respect to the number
of images that can be presented for simultaneous viewing.
4.1  Interactive Visualizations 107

Another approach to visualizing spatio-temporal information is the space–time


cube (STC), where two horizontal dimension represents space and the vertical
dimension represents time. The idea was introduced by T. Hägerstrand in the 1960s
(Hägerstrand 1970), but software implementations appeared relatively recently
(Kraak 2003; Andrienko et al. 2003; Kapler and Wright 2005). In our illustrations,
we have used STC for the visualization of trajectories and spatial events. STC can
also be used to visualize spatial time series referring to places and to connections
between places (flows); however, occlusions and clutter in such displays make
them illegible unless filtering is applied to reduce the amount of visible graphical
information within the cube. Filtering may also be needed for displays of multiple
trajectories or events.
Figures  4.4 and 4.5 provide examples of visualizing spatial time series in an
STC. The STC in Fig. 4.4 shows time series related to places, specifically monthly
counts of Flickr users that visited different places in Switzerland. The counts are
represented by the sizes of proportional circle symbols. The circles corresponding
to one place are vertically aligned above this place; the vertical positions of the
circles correspond to consecutive months of a five-year period. Filtering has been
applied to make the display more legible: the symbols representing values below
20 are hidden from the view. The STC in Fig. 4.5 shows monthly flows of Flickr
users between places in Switzerland. The flows in different months are represented
by lines with corresponding vertical positions. The line widths are proportional to

Fig. 4.4  A space–time cube representation of spatial time series related to places


108 4  Visual Analytics Infrastructure

Fig. 4.5  A space–time cube representing spatial time series of flows between places

the flow magnitudes (counts of moves). Here, we have also applied filtering: only
the monthly flows with magnitudes 5 or more are visible.
Due to occlusions and projection effects in an STC, it may be necessary to
manipulate the view (rotate, shift, zoom in and out, and change the opacity level of
the graphical elements) for correct perception of the represented information.
Map sequences and STC provide only limited opportunities for represent-
ing various characteristics of movement and changes in these characteristics over
time. They are also quite limited with respect to the length of the time interval that
can be effectively studied. Therefore, these displays are often complemented with
other types of graphs and diagrams, which focus on the temporal and thematic
aspects of the data but do not convey spatial information (which means that these
techniques cannot be used alone but only in combination with spatial displays).
The most popular display for representing time series of numeric attributes is the
time graph, or temporal line plot. Time graphs can visualize time series related to
places and connections between places and thematic attributes associated with posi-
tions of moving objects within their trajectories. Examples of time graphs have been
given in Figs. 1.6, 1.7, and 3.1. One of the dimensions of a time graph (typically
horizontal) represents time. Attribute values are represented by positions along the
other dimension. Consecutive positions corresponding to the same object or place
are connected by lines; hence, each time series is represented by a polygonal line or
curve. The display may contain multiple lines for multiple time series (Fig. 3.1).
4.1  Interactive Visualizations 109

A disadvantage of the time graph view is overplotting of the lines. To avoid


overplotting, time series or trajectories are represented by segmented bars
stacked one below another (e.g. Kincaid and Lam 2006). An example is shown
in Fig. 4.6. In this type of display, which is a variation in the Gantt chart tech-
nique, the horizontal dimension represents time and the horizontal bars represent
trajectories (in our example, trajectories of cars in Milan). The horizontal positions
and the lengths of the bars correspond to the temporal positions and durations of
the respective trajectories; therefore, we call these bars “time bars”. The vertical
dimension is used for stacking the time bars, which can be ordered according to
values of one or more attributes describing the trajectories. In our example, the
time bars are ordered by the start times of the trajectories. The stacking layout is
free from overplotting; however, there is often not enough screen space to show all
trajectories. As can be noticed, the display in Fig. 4.6 is supplied with a vertical
scroll bar for scrolling through the set of trajectories. In the horizontal dimension,
the user may apply temporal zoom to use the whole display width for representing
a chosen time interval in more detail. In our example, the display shows a time
interval of 3 h (180 min) length. The time slider at the bottom allows horizontal
scrolling through the time as well as extending and shrinking of the currently vis-
ible time interval.
The time bars representing trajectories are divided into segments coloured
according to the values of one currently selected positional attribute; in our exam-
ple, it is the speed. For this purpose, the value range of the attribute is interactively
divided into intervals. This may be done by using the interactive interface visi-
ble at the bottom of Fig. 4.7. It contains a bar with multiple sliders (double-ended
vertical arrows) corresponding to the breaks between the intervals, which can be
moved, deleted, or added by the user. The user can also set precise values for the
interval breaks using the text field above the slider bar. Each so defined interval is
assigned a particular colour. In our example, we assign the colours according to

Fig. 4.6  Left A temporal bar chart shows temporal variation in positional attribute values within
trajectories. Right The map is dynamically linked to the bar chart: when the mouse cursor points
on a bar, the corresponding trajectory is highlighted on the map and the spatial position corre-
sponding to the mouse position is marked by the intersection of the horizontal and vertical lines
110 4  Visual Analytics Infrastructure

Fig. 4.7  The positional attribute values are represented by two-tone pseudo-colouring. On the


right, a fragment of the display is enlarged

one of the Colour Brewer colour scales (Harrower and Brewer 2003). Colourless
segments correspond to intervals of data absence.
Representing classes (intervals) of attribute values instead of the individual
values decreases the precision in conveying the values. The precision can be
increased by applying two-tone pseudo-colouring (Saito et al. 2005), which is
also known as the Horizon Graphs technique (Heer et al. 2009). It is illustrated
in Fig. 4.7. The idea is to use two colours for painting each bar segment. If the
value x corresponding to a segment belongs to the ith value interval, the colours of
the ith and (i − 1)th intervals are used. Let Bi be the beginning of the ith interval
(i.e. the value of the break between the ith and (i  − 1)th intervals) and let Li be
the length of the ith interval. The bar segment is divided in the vertical dimen-
sion into two parts so that the height of the lower part in relation to the whole bar
height is proportional to the ratio of (x  −  Bi) to Li. The lower part is painted in
the colour of the ith interval and the remaining part in the colour of the (i − 1)th
interval. Hence, the closer the value is to the lower interval boundary, the smaller
is the amount of colour of this interval and the larger is the amount of colour of
the previous interval. Approaching the upper interval boundary increases the pro-
portion of the colour of this interval to 100 %. It is obvious that the two-tone col-
ouring requires sufficient height of the bars. As we have found empirically, the
two-tone colouring is visible when the bar height is at least seven pixels, whereas
just two or three pixels are sufficient for plain class-based colouring, as in Fig. 4.6.
This may have implications when large numbers of trajectories need to be seen
simultaneously.
The use of temporal bar charts for visualizing qualitative positional attributes
of trajectories is described by Chang et al. (2013). The land use categories of the
locations visited by movers are represented by colours. The bars in the display can
be interactively re-ordered and filtered based on various attributes of the movers
and/or their trajectories.
Other non-spatial displays that we have already used in our illustrations are
two-dimensional histograms (Figs. 1.12 and 1.14) and origin–destination matrices
4.1  Interactive Visualizations 111

(Fig. 1.27). The two-dimensional histogram is a generic technique that can be


applied to any two attributes. We find it particularly useful for exploring the dis-
tribution of time-referenced data with respect to two temporal cycles, for example,
daily and weekly, as in Figs. 1.12 and 1.14.
The origin–destination matrix (OD matrix) is a technique to represent flows
between places, that is, the same information as in a flow map. As can be seen
in Fig. 3.7, a flow map may greatly suffer from overplotting. An OD matrix is
free from overplotting but lacks spatial information. When the places are few and
have descriptive labels, as in Fig. 1.27, this is not a big problem since the dis-
play is easily understandable. However, when the places have no descriptive labels
and/or are numerous, it is difficult to understand which place corresponds to each
row and column of the matrix. As a partial solution to this problem, Wood et al.
(2010) extend the technique of OD matrix and create representations called OD
maps, in which multiple OD matrices are arranged according to the geographical
positions of the places.
OD matrices with many rows and columns are, in fact, not meant for elemen-
tary tasks such as estimation of the amount of movement between particular posi-
tions but for synoptic tasks, for example, for detecting hubs, that is, places linked
to many other places, and clusters of interlinked places (Guo 2007), or for compar-
ing the overall movement characteristics of different groups of mice in the labo-
ratory study introduced in Sect. 2.10.9. Thus, two matrices in Fig. 4.8 represent
summarized movements of healthy male and female mice. The rows and columns
of the matrices correspond to 27 RFID sensors, and the cells are coloured accord-
ing to the magnitudes of the flows between the sensors. Grey corresponds to zero
values, that is, absence of transitions, and the shades of yellow through orange to
red represent increasing magnitudes. This is a different example of possible value
encoding than in Fig. 1.27, where the magnitudes are represented by sizes of
square symbols in the matrix cells.

Fig. 4.8  Origin–destination matrices summarize the movements of male (left) and female (right)
laboratory mice
112 4  Visual Analytics Infrastructure

In Fig. 4.8, we can see a difference between the matrices for the male (left) and
female (right) animals. In both matrices, high values are clustered along the diag-
onal. However, in the matrix for the males, the values are strongly concentrated
along the diagonal than in the matrix for the females, where the values are distrib-
uted more widely. To understand what it means, we need to know the principle of
ordering of the matrix rows and columns.
The ordering of the matrix rows and columns plays a decisive role in enabling
the user to see and interpret interesting patterns and gain useful information from
the display. In Fig. 4.8, the rows and columns were manually ordered according
to the spatial positions of the RFID sensors. As the first step, the sensors were
divided into five groups corresponding to different levels or compartments of the
cage. Then, the sensors were ordered according to the spatial distances between
them, that is, the distances between adjacent sensors in the sequence were mini-
mized, while the grouping was preserved. With this ordering, the clusters of high
values along the diagonal in the matrix for males mean that the males predomi-
nantly moved within the compartments and made relatively few travels between
the compartments. This kind of movement behaviour is called territoriality. The
matrix for the female mice shows that they moved more actively between different
compartments.
Manual ordering of matrix rows and columns may be too difficult or even
impossible for a large number of places. Guo and Gahegan (2006) suggest an
automatic ordering method based on complete-linkage hierarchical clustering,
which is done according to the strengths of the links between places (i.e. the mag-
nitudes of the flows in both directions). Although the spatial distances are not
taken into account by the ordering algorithm, the resulting ordering tends to con-
nect spatial neighbours. This is a consequence of the inherent spatial dependence
in spatial phenomena, also known as “the first law of geography” or “Tobler’s first
law”: “everything is related to everything else, but near things are more related
than distant things” (Tobler 1970, p. 236).
As we have noted earlier, non-spatial displays representing spatial data must
be used in combination with spatial displays, that is, cartographic maps and
STC. When multiple displays are used to convey different aspects of the same
data, they need to be linked so that the user can mentally integrate related pieces
of information from the different channels. The most common technique for
linking is to simultaneously highlight corresponding display elements when the
user selects an item in one of the displays, that is, by mouse-pointing or click-
ing. This is illustrated in Fig. 4.6, where the mouse cursor points on one of the
bars in the bar chart display of car trajectories in Milan. The trajectory repre-
sented by this bar is highlighted (coloured in white) in the map. Furthermore,
the geographical position corresponding to the position of the mouse within the
bar is marked on the map by the intersection of the horizontal and vertical lines.
Other methods of linking include using the same colours and/or the same order-
ing of graphical elements (whenever appropriate) and simultaneous consistent
reaction of all displays to dynamic filtering of the data, which will be discussed
in the next section.
4.1  Interactive Visualizations 113

Boyandin et al. (2011) suggest a special display linking technique for time-variant
flows between places. The overall display consists of two maps and a table between
them. Each row in the table corresponds to one place. Lines representing flows are
drawn not between places within a map but between places in the maps and rows in
the table. The left and right parts of the display show the outgoing and ingoing flows,
respectively. The rows of the table contain visual representations of the time series of
the flow magnitudes.
Besides linking, common interaction techniques facilitating visual exploration
of movement data and context data include manipulation of the view (zooming,
shifting, rotation, changing the visibility and rendering order of different infor-
mation layers, changing opacity levels, etc.), manipulation of the data representa-
tion (selection of attributes to represent and visual encoding of their values, for
example, by colouring or line thickness), manipulation of the content (selection of
the objects that will be shown), and interaction with display elements (e.g. access
to detailed information by mouse-pointing, highlighting, selection of objects to
explore in other views, etc.).
In addition to the material of this section, we would like to refer the readers to
the comprehensive survey of visualization and interaction techniques for temporal
data made by Aigner et al. (2011). The survey includes, among others, techniques
suitable for various types of spatio-temporal data: trajectories, spatial events, and
spatial time series. Andrienko and Andrienko (2013) give an overview of various
visualizations specifically suitable for movement data.

4.2 Interactive Filtering

The most evident purpose of data filtering is to select a relevant portion of the data
and ignore the irrelevant part. This can be achieved by a database query, so that
only the relevant part of the data is extracted from the database and used in further
analysis. Another kind of filtering is interactive dynamic data filtering, when parts
of the data are temporarily hidden by the user. It is used for other purposes than
database queries. First, when all information cannot be perceived in a single view
due to the size of the data and/or display problems (clutter, occlusions, insufficient
screen space, etc.), interactive dynamic filtering allows the user to explore the data
by focusing temporarily on data subsets and quickly changing the focus. Second,
interactive dynamic filtering supports exploration of relationships between differ-
ent components of the data: the user sets and dynamically changes a filter based
on some component(s) and examines how this affects the other components. For
example, the user may select movement data from different time intervals and
determine where these data are in space and what values of movement attributes
characterize them. This may help the user see relationships between the temporal,
spatial, and thematic (attributive) components of the data.
To be fit for these purposes, an interactive dynamic filtering system must sat-
isfy certain basic requirements. First, it should be easy for the user to set and
114 4  Visual Analytics Infrastructure

modify filter conditions. Second, the user should be able to combine several kinds
of filters. Third, all visual displays must reflect the current state of the filter and
promptly react to filter changes.
There are different kinds of filtering that may be useful in exploration of move-
ment data and context data:
• Spatial filtering
• Temporal filtering
• Attribute filtering
• Filtering of object classes
• Filtering by direct selection of objects
• Filtering of trajectory points and segments
• Filtering of related object sets
We shall explain these kinds of filtering using particular implementations as
examples, but, generally, filtering tools can be implemented in many different
ways, and many other examples of interactive dynamic filtering can be found in
the literature (e.g. Weaver et al. 2007). Sophisticated filtering of movement data
can also be done by means of queries to moving object databases (Güting and
Schneider 2005; Pelekis et al. 2006; Giannotti et al. 2011).

4.2.1 Spatial, Temporal, and Attribute Filtering

An easy way to set a spatial filter is simply to draw a rectangular frame in a map
display. This may be called a spatial window. Only those geographical objects that
fit in or intersect with the spatial window remain visible on the map. By moving or
resizing the window, the user alters the filter. Figure 4.9 demonstrates the use of a
spatial window for exploring the vessel movement in the North Sea. From left to
right, we have created a small window (red rectangle) for selecting the trajectories
of ships appearing in or at the port of Ijmuiden (close to Amsterdam) and then
moved the window to the port of Den Haag and to the Strait of Dover. The filter-
selected trajectories can be seen in the three screenshots of the map display in the
upper part of Fig. 4.9.
The images in the lower part of Fig. 4.9 show the effect of the spatial filter on
a table display of frequencies of different types of ships. The first column of the
table contains the names of the ship types, the second and the third columns show
the frequencies of the ship types in the whole dataset and among the trajectories
satisfying the filter, and the fourth column shows the ratio between the numbers
in the third and second columns expressed in percents. Hence, we can see how
many ships of different types visited Ijmuiden, Den Haag, and the Strait of Dover,
both in absolute number and in proportion to the total number of ships of each
type over the whole territory. We can learn, for example, that GDC and chemi-
cal are the most frequent types of vessels at Ijmuiden. They are also frequent at
Den Haag, but the type container has a yet higher frequency. The type pass/ferry
4.2  Interactive Filtering 115

Fig. 4.9  Upper row ship trajectories are filtered by spatial window. The trajectories are shown
with 10 % opacity. Lower row a display of frequencies of different ship types changes in
response to the changes of the spatial filter

clearly dominates in the Strait of Dover, whereas the type “miscellaneous” rarely
appears there. In this way, we can explore what types of ships navigate in different
parts of the North Sea and/or come to different ports.
Trajectories and, more generally, linear spatial objects can also be spatially fil-
tered by specifying two or more areas that must be visited or intersected. This can
be done, in particular, by selecting areas in an existing map layer consisting of
area objects. Thus, in Fig. 4.10, we use a map layer with tessellation of the terri-
tory into Voronoi polygons (Okabe et al. 2000). We have selected two polygons
by clicking first on the larger one in the south and then on the smaller one to the
north-east of the first polygon. Depending on the chosen mode, the filter will
select the trajectories that visit both areas in any order (Fig. 4.10 left), in the order
in which the areas were selected (centre), or in the opposite order (right). It is also
possible to select the trajectories that visited at least one of the selected areas. The
filter can also be inverted, allowing us to see all trajectories that did not visit any
of the areas.
For temporal filtering, the user selects a time interval (temporal window) within
the time span of the data. The filter selects the events that existed within this inter-
val and the parts of the trajectories that occurred during this interval. The user
interface for temporal filtering may have the form of a slider bar, as in Fig. 4.11
116 4  Visual Analytics Infrastructure

Fig. 4.10  Trajectories are spatially filtered by selecting areas in an existing map layer. Left tra-
jectories visiting the two selected areas in any order. Centre trajectories visiting the upper area
after the lower area. Right trajectories visiting the upper area before the lower area

Fig. 4.11  Temporal filtering can be done by means of a slider bar (top left) or by clicking on a
temporal object in a map display (right)

top left, where the size and position of the slider define the temporal window. The
slider can be dragged along the bar by the user or moved automatically. In both
cases, the map is dynamically re-drawn, which produces an effect of map anima-
tion. Another possibility is selection of the temporal neighbourhood of an event or
trajectory point. For example, in Fig. 4.11, we have right-clicked on a dot on the
map representing a particular spatial event. The time filter has selected the time
window [t1 − 30 min, t2 + 30 min] around the existence time [t1, t2] of this event.
The relative interval boundaries are specified in the user interface of the time filter
4.2  Interactive Filtering 117

(Fig. 4.11 left, lower part). As a result of setting the filter, the map shows only the
events (red circles) and parts of the ship trajectories that fit in the selected time
window.
Dynamic attribute filtering (i.e. filtering by values of thematic attributes) has
become well known and widely used since the first applications of dynamic que-
ries were built at the University of Maryland’s Human–Computer Interaction
Laboratory (Ahlberg et al. 1992; Shneiderman 1994). Attribute filtering can be
applied to any objects characterized by thematic attributes. In particular, trajecto-
ries can be filtered based on their length, average speed, sinuosity, and other attrib-
utes characterizing trajectories as units. Attribute filtering can also be applied to
points and segments of trajectories, but this kind of filtering will be discussed and
illustrated later in Sect. 4.2.3.

4.2.2 Filtering of Object Classes and Individual Objects

Filtering of object classes is a special case of attribute filtering. It can use any
attribute with a nominal value scale. The values can be considered as names or
labels of object classes, clusters, types, categories, groups, etc. For the user’s con-
venience, the filter may have an interface with checkboxes, allowing the user to
switch the classes on and off, as is illustrated in Fig. 4.12. Here, the filtering of
ship trajectories based on ship types allows us to notice differences in the move-
ments of different ship types. Thus, we notice that oil ships (reddish brown) sail
farther away from the coast than container ships (bright red) and that passenger
and ferry ships (orange) and fishing ships (green) mostly move across the traffic
lanes followed by the oil and container ships.
Filtering by direct selection of objects can be used, for instance, when the user
needs to focus on exploring the movement of a particular ship or relative move-
ments of two particular ships. Generally, the user can directly select one or more
objects from some set (e.g. set of trajectories, set of spatial events, set of context

Fig. 4.12  Filtering of ship trajectories based on ship types


118 4  Visual Analytics Infrastructure

elements of a certain type, etc.) and filter out all other objects belonging to this set.
Hence, selection of trajectories will not affect the set of events and the other way
around. The objects can be selected by interacting with their graphical representa-
tions in any of the available displays, for example, by clicking or dragging a frame
to enclose them. This is convenient when some objects attract user’s interest, and
the user wants to view them in detail without being distracted by other objects.

4.2.3 Filtering of Trajectory Segments

Not only can trajectories as a whole be filtered according to values of thematic


attributes referring to the whole trajectories, but also points and segments of
trajectories can be filtered based on the values of positional attributes, that is,
dynamic attributes referring to the positions within the trajectories. In Sect. 4.1,
we have introduced a temporal bar chart display of trajectories and positional
attribute values. The colour legend on the left of the display is simultaneously
an interactive tool for segment filtering. By clicking on the coloured rectan-
gles, the user switches off and on the corresponding intervals of the attribute
values. This is illustrated in Fig. 4.13. The temporal bar chart display visual-
izes the attribute “length of the bounding rectangle diagonal in time interval
of 1 h”, which has been computed for the positions in the trajectories of the
ships. The name of the attribute is shown in the upper left corner of the display.
As noted in Sect. 3.5, low values of this attribute may indicate stops. In the
upper image, we have switched off all value intervals except for the interval
0–1,000 m, which is represented by the red colour. In the bars representing the
trajectories, the segments where the values belong to the inactive intervals have
become less prominent, and only the red segments remain unaltered. The filter
affects not only the bar chart but also the map display, which now shows only
the trajectory positions satisfying the filter, that is, the supposed positions of
ship stops. Each position is represented by a special glyph showing the move-
ment directions before and after reaching this position. The previous direction
is shown by a T-shaped tail attached to the dot representing the position and the
next direction by a ray emanating from the dot. When two or more consecutive
positions satisfy the filter, the corresponding glyphs are connected by lines. In
the map on the top right of Fig. 4.13, there are many places with lots of over-
lapping glyphs pointing in diverse directions. These are, apparently, the anchor-
ing places of the ships.
For comparison, in the lower part of Fig. 4.13, we have unselected the value
interval 0–1,000 and selected the value intervals 15,000–30,000 and over 30,000.
Hence, we are now focusing on the trajectory segments corresponding to fast
movements of the ships. These segments are visible in the lower map image.
Unlike the upper map, the lower map contains many long straight lines.
By opening two or more temporal bar chart displays showing different posi-
tional attributes, it is possible to filter trajectory segments by values of two or
more attributes.
4.2  Interactive Filtering 119

Fig. 4.13  Filtering of trajectory segments based on values of positional attributes. Left the tem-
poral bar chart display visualizes the attribute “bounding rectangle diagonal in time interval of
1 h”. On the top, the segment filter is set to values below 1,000 m (1 km). On the bottom, the
segments with the values 15 km or more are selected. The states of the map display reflecting the
different conditions of the segment filter are shown on the right. In the lower right corner of each
map, an enlarge fragment of the territory is shown

Filtering of trajectory segments can be used not only for the visual investigation
of the spatial distribution of movement characteristics but also for data transforma-
tions, specifically extraction of movement events (Sect. 3.5) and division of trajec-
tories (Sect. 3.2). In event extraction, spatial events are created from the trajectory
points satisfying the filter conditions. For example, stop events can be made from
the points of the ship trajectories where the bounding rectangle diagonal in a 1-h
time interval is below 1 km (Fig. 4.13 top). The points satisfying the filter are
duplicated, and the resulting spatial events are put together in a new independent
dataset (information layer).
When two or more consecutive points of a trajectory satisfy the constraints of
the segment filter, several strategies are possible:

1. treat all points as independent events;


2. select a representative point from the sequence: the first, the last, the middle point, or
the medoid, that is, the point with the smallest average distance to all other points;
3. construct an average point from the sequence;
4. create a single multi-point event, which is prolonged in time.
120 4  Visual Analytics Infrastructure

The user selects the strategy according to the semantics of the movement events
that need to be extracted. For example, for extracting aircraft take-off and landing
events, it is reasonable to take the first and the last point of a sequence, respec-
tively (Andrienko et al. 2011). Stops and low-speed events may be represented by
multi-point events. Strategy 1 may be invalid if the trajectories have irregular time
intervals between the positions: where the intervals are shorter, there may be more
consecutive points satisfying the constraints, and hence, more events will be gen-
erated. Then, a high number of events in a place may be not meaningful for the
application but only reflect the specifics of data collection. One approach to deal
with this problem is re-sampling of the data (Sect. 3.1) so that the time intervals
between the records become equal. However, this is not needed when strategies
2, 3, or 4 are used because they generate a single event irrespectively of the
number of consecutive points satisfying the constraints.
For the extracted movement events, a number of thematic attributes are auto-
matically generated: duration, spatial extent, average speed and direction of the
movement, and statistical aggregates (average, minimum, maximum, median, etc.)
of user-selected dynamic attributes.
In trajectory division, new trajectories are made from the parts of the trajecto-
ries satisfying the segment filter. For example, new trajectories can be produced
from the parts of the ship trajectories corresponding to fast movement (Fig. 4.13
bottom). Analogously to the extraction of movement events, the parts of the trajec-
tories satisfying the filter are duplicated, and the new trajectories are put together
in a new independent dataset (information layer).
It may happen that a sequence of trajectory points satisfying a filter is inter-
rupted by an occasional point or two not satisfying the filter. For example, a
sequence of points from a ship trajectory with speed values 15 km/h or more may
be interrupted by a point with a lower value. It may be desirable to disregard small
interruptions and create new trajectories from longer point sequences rather than
split a sequence at each occurrence of a filter-failing point. This may be achieved
in the following way. The user may specify the minimal acceptable time interval
Δt between two new consecutive trajectories extracted from an original trajectory,
that is, between the end of the previous trajectory and the beginning of the next
trajectory. When there is a sequence of consecutive filter-failing points with the
total duration below Δt, it is not used for trajectory splitting. Instead, the points
are included in a new trajectory together with the preceding and following points
satisfying the filter.
It is also possible to use segment filtering for trajectory simplification
(Sect. 3.7). If the points that do not satisfy the filter are deemed to be unimpor-
tant for the intended further analysis, each sequence of such points either can be
reduced by keeping only the first and the last point and removing all intermedi-
ate points, or can be replaced by one representative point. The user can choose
the way in which the representative point is selected or generated: it may be the
first, last, or middle point of the sequence, or the closest point to all other points
of the sequence, or a new point with the average coordinates from all points of the
sequence. The lifetime of the representative point (i.e. time interval of its validity)
4.2  Interactive Filtering 121

is set to the time interval of the whole sequence, that is, from the start time of the
first point to the end time of the last point.
Although new datasets created from trajectories using segment filtering or other
methods are independent of the original data and can be analysed and used sepa-
rately, it may also be necessary to analyse several datasets together. In such a case,
the analyst should be able to propagate filtering from one dataset to the other data-
sets that are related to it.

4.2.4 Filtering of Related Object Sets

Two object sets are related when the objects of one of them have references to
objects of the other set. For example, when movement events are extracted from
trajectories (see Sect. 3.5), they may obtain references to the trajectories from
which they have been extracted, that is, the event extraction tool can automatically
attach the references to the events. In the process of data exploration, it may be
useful if the filter of one dataset is propagated to the related dataset(s). For this
purpose, a tool for filtering of related sets is used. We shall demonstrate a possible
operation of such a tool by an example.
For the example, we shall use the Milan car trajectories dataset. Using the seg-
ment filtering tool, we extract events of slow movement, which are constructed
from the trajectory points with the movement speed not more than 10 km/h. The
extracted events are represented in Fig. 4.14a and b by circle symbols on a map
and balls in an STC, respectively. Along with the events, a table with various the-
matic attributes characterizing the events is automatically generated. Among other
information, this table contains an attribute “Trajectory identifier”. The values of
this attribute link each event to the trajectory from which it has been extracted.
These links are used in filtering, as will be shown later.
To make the example more interesting, we apply density-based clustering of the
slow movement events according to the spatio-temporal distances between them.
We obtain 149 dense clusters (Fig. 4.14c). The events that do not belong to any
cluster are classified as “noise”. We filter the “noise” out (by using the class fil-
ter, see Sect. 4.2.2) and build spatio-temporal convex hulls around the clusters. In
Fig. 4.14d, the hulls are represented in an STC as yellow-coloured volumes.
It is highly probable that the dense spatio-temporal clusters of the low-speed events
correspond to traffic congestions: when many such events occur closely in space and
time, it cannot be occasional and is likely to be caused by an obstruction of the move-
ment. Hence, the convex hulls represent the spatial and temporal positions and extents
of the traffic jams in the city. To find the most severe congestions, we apply attribute fil-
tering to the set of the hulls and select those having the duration of at least 30 min and a
spatial extent (i.e. the diagonal length of the bounding rectangle of the spatial footprint)
of at least 1 km. The thematic attributes “duration” and “extent” have been automati-
cally generated, among other attributes, when the hulls have been built. The user inter-
face for the attribute filtering may appear as shown in Fig. 4.15a.
122 4  Visual Analytics Infrastructure

Fig. 4.14  a Slow movement events (red circles) have been extracted from car trajectories (blue
lines). b The events are shown in a space–time cube. c Dense spatio-temporal clusters of the
events. d Spatio-temporal convex hulls (yellow) have been built around the spatio-temporal clus-
ters of the events

The filtering by duration and extent selects 15 hulls representing the most
severe traffic jams. Now, we want to select also the movement events that belong
to these traffic jams and the trajectories that were affected by these traffic jams.
Since the hulls have been built based on the clusters of the movement events, the
labels of the clusters have become the identifiers of the hulls. The table with the
thematic attributes of the events contains an attribute representing the result of
the clustering. For each event, the value of this attribute is either a cluster label or
“noise”. The cluster labels link the events to the corresponding convex hulls. This
link can be used for filtering.
Figure  4.15b shows the possible user interface of a filter propagator that fil-
ters objects in one dataset based on the filtering of objects in another dataset. We
use it to filter the events based on the prior filtering of the convex hulls. It selects
only the events belonging to the clusters enclosed by the currently selected convex
4.2  Interactive Filtering 123

Fig.  4.15  a The set of the convex hulls is filtered based on the durations and extents of the
respective spatio-temporal clusters. b The filtering of the set of the hulls is propagated to the set
of events. c The filtering of the set of events is propagated to the set of trajectories. d, e The hulls,
event, and trajectories selected by the combination of the filters

hulls. Technically, this means that the value of the attribute representing the clus-
tering results must coincide with the identifier of some currently selected hull.
To also propagate the filtering to the set of the trajectories, we utilize the link
between the events and the trajectories. As said before, there is a thematic attribute
of the events that links each event to the trajectory from which it was extracted, that
is, the attribute value is the identifier of the respective trajectory. We create and set
up a filter propagator that filters the trajectories based on the current filtering of the
events (Fig. 4.15c). It selects a trajectory only when at least one event extracted from
it is currently selected by the filter of events. Hence, only the trajectories that had
slow movement events within the currently selected hulls are selected by the filter
propagator. In Fig. 4.15d and e, the convex hulls, events, and trajectories selected by
the combination of filters are shown on a map and in an STC.
124 4  Visual Analytics Infrastructure

This example demonstrates the possibility of creating a chain of filter propagators:


set A → set B → set C. It is possible to create not only longer chains of filter propa-
gators but also branching structures, since filtering of one set can be propagated to
two or more related sets. In our example, we could filter the low-speed events based
on their attributes and use the filter propagators shown in Fig. 4.15b and c, to select
the corresponding hulls and trajectories. For this purpose, we would only need to
change the filtering direction for the filter propagator linking the events and the hulls
(Fig. 4.15b).
Propagation of filtering from a related object set can be combined with any
other filters. When several kinds of filters are created for one object set, only the
objects that satisfy all filters are selected.

4.3 Dynamic Aggregation

Visual displays representing individual objects respond to object filtering by hid-


ing the graphical elements representing the filter-failing objects or by decreasing
the visual prominence of these elements and making them insensitive to mouse
interactions. Displays that represent objects in an aggregated form, for example
histograms, respond to object filtering by applying the aggregation only to the
objects satisfying the filter and representing the new results of the aggregation.
However, object aggregation is not necessarily confined within some visuali-
zation tool, such as a histogram display. Thus, Sect. 3.8 showed that spatial and
spatio-temporal aggregation of movement data can produce new data, in particu-
lar, attributes characterizing places and attributes characterizing flows between the
places. A set of spatial events can also be aggregated by places and, optionally,
time intervals. As a result, summary statistics of the events that occurred in the
place are represented as values of new attributes.
Derived data resulting from aggregation can be visualized on different displays.
In this case, the aggregation is not done locally within a display, but the display
shows externally computed aggregate values regardless of the original data from
which the aggregates have been derived. It may be beneficial for analysis if filter-
ing of the original data could be propagated to the derived data. This means that
the aggregation is re-applied to the subset of the objects satisfying the filter, which
changes the values of the attributes representing the aggregation results. All cur-
rently existing displays showing these attributes are notified that the values have
changed. In response, the displays are updated to reflect the changes. This pro-
cess is called dynamic aggregation. Objects that can re-compute their attributes
resulting from aggregation in response to filtering of the original objects are called
dynamic aggregators. In particular, places and connections between places can be
dynamic aggregators responding to filtering of trajectories.
An example of dynamic aggregation is shown in Fig. 4.16. We have taken a
subset of the Milan car trajectories from one day (Wednesday) divided by time
gaps of at least 30 min, which in this case signifies stops for 30 min or more. We
4.3  Dynamic Aggregation 125

Fig. 4.16  Dynamic aggregation of the car trajectories. Different subsets of trajectories are


selected using filtering by visited areas (left). Another map display (right) shows re-computed
values of aggregate attributes associated with places (areas of a Voronoi tessellation) and connec-
tions between the places. The pie charts represent the counts of the trajectory starts and ends in
the places. The widths of the flow symbols represent the counts of the moves between the places

have spatially aggregated these trajectories by cells of a Voronoi tessellation. The


cells and the connections between the cells are dynamic aggregators. We have cre-
ated an additional map display, in which the counts of the moves between the cells
are represented by proportional widths of arrow symbols. The counts of the trajec-
tory starts and ends in the cells are represented by pie charts, so that the size (area)
of a chart is proportional to the sum of the counts and the sizes of the sectors are
proportional to the counts of starts (yellow) and ends (blue).
We apply spatial filtering by visited areas to the car trajectories to see where
the cars go after entering the city from different sides. On the top left of Fig. 4.16,
we have selected the trajectories that go through the two highlighted areas (hav-
ing black boundaries) on the north-west in the direction towards the centre. The
aggregate attributes of the places and connections have been re-computed based
126 4  Visual Analytics Infrastructure

on this selection. Among others, the counts of moves, starts, and ends have been
re-computed. The map display visualizing these attributes has received a notifica-
tion that the values of these attributes have changed. In response, the map has been
re-drawn to represent the new values (Fig. 4.16, top right). We have adjusted the
visualization parameters, specifically the maximal arrow width and the maximal
pie size, to enhance the display expressiveness. From the sizes of the pie charts in
different places, we see that many of the cars pass the two selected areas end their
trips in different places in the city centre, but there are also many cars going to the
north-eastern and south-eastern exits of the city.
On the bottom left of Fig. 4.16, we have selected a different subset of trajecto-
ries: those that go through the two highlighted areas in the north-east in the direc-
tion towards the centre. The dynamic aggregators have reacted to the change in the
trajectory filter by re-computing the aggregate values based on the new selection
of the trajectories. The map showing the aggregates has changed as is shown in
Fig.  4.16, bottom right. The visualization parameters remain the same as previ-
ously. The smaller sizes of the largest symbols in the updated map correspond to
smaller-than-before maximal values of the aggregate attributes. The previous fil-
tering selected 277 trajectories; the maximal sum of the start and end counts was
268 and the maximal move count was 200. The new filtering has selected only 188
trajectories; the maximal sum of the start and end counts is now 187 and the maxi-
mal move count is 170. We see that the cars coming from the north-east behave
differently than the cars coming from the north-west. Quite few cars from the
north-east go to the city centre, while the majority of the cars just use the northern
motorway to pass the city and go farther to the west and north-west.
Dynamic aggregation is a convenient tool for interactive data exploration,
but the process of re-aggregation may sometimes take more time than desired.
Technically, dynamic aggregators keep references to the objects from which the
aggregate attribute values have been computed. Thus, each place has a list of
trajectories that visit it, and each connection keeps a list of trajectories that pass
through it. When the original objects are filtered, each dynamic aggregator checks
which of its objects are currently selected by the filter and then re-computes the
aggregate values based on these currently selected objects. The time needed for
the whole process is proportional both to the number of the original objects and
to the number of dynamic aggregators that have been created. For large datasets
and/or fine aggregations, the process of dynamic re-aggregation may decrease the
responsiveness of visual analytics tools. Therefore, the user should be given an
opportunity to enable dynamic aggregation only when it is really needed.

4.4 Recap

Cartographic maps and STC are universal types of display for visualizing
various kinds of spatio-temporal objects and data, including trajectories of
moving objects, spatial events, aggregate movements (flows), and time series of
4.4 Recap 127

attribute values. Cartographic maps are indispensable in analysing all kinds of


spatial data owing to their capability to convey the spatial context and spatial
relationships among data items. Maps are very good at representing space but
very weak at representing time. An STC employs an additional spatial dimen-
sion for representing time. However, such a representation is usually inef-
fective in showing data over long time periods. Besides, an STC represents a
three-dimensional scene in a two-dimensional projection. Therefore, correct
perception of the information represented in the cube requires user interaction
for looking at the scene from different perspectives. Even with such interac-
tion, it may be difficult to ascertain the absolute and relative spatial and tempo-
ral positions of the objects.
Besides the limitations in representing time, maps and STC also provide only
limited opportunities for representing values of thematic attributes associated with
spatial and temporal objects and positions. Therefore, various additional displays
are used to visualize different aspects of the data. In particular, time graphs, tem-
poral bar charts, and temporal histograms can be utilized to represent temporal and
thematic aspects of data. Displays representing attributes irrespectively of space
and time, such as scatter plots, parallel coordinates, and histograms, may also be
useful. Multiple co-existing displays are visually linked by using consistent visual
encodings (e.g. same colours) and exhibit coordinated behaviours by simultaneous
consistent reaction to various user interactions.
Visual clutter and occlusions, which frequently occur in maps, STC, and other
displays, obstruct information perception and analysis. Interactive filtering ena-
bles exploration of the data by focusing on selected subsets, which reduces clut-
ter and occlusions. Filtering is also useful for establishing relationships between
different components of the data and for integration of information represented in
different displays. Interactive filtering can be done according to different aspects
of the data: spatial, temporal, thematic (attributive), or class/group membership.
For complex objects, such as trajectories, filtering can be applied to object compo-
nents. Thus, trajectory points and segments can be filtered according to values of
position-related thematic attributes.
Filtering may not only define the information that will be shown in the visual
displays but also change secondary data that have been derived from the data that
were filtered. In particular, results of data aggregation can be re-computed using
only the data subset selected by the current filter. In response, all displays showing
the aggregated data are updated to reflect the changes.
The visual and interactive techniques described in this chapter can be consid-
ered as components of a generic infrastructure supporting visual exploration of
different types of spatial and spatio-temporal data, including movement data. In
particular, they enable exploration of movement data in connection with the spa-
tio-temporal context of the movement.
The following chapters are dedicated to more specific methods addressing
movement data in their different forms (trajectories, spatial events, local time
series, and spatial distributions) and supporting different types of analytical tasks
focusing on movers, spatial events, space, and time.
128 4  Visual Analytics Infrastructure

References

Ahlberg, C., Williamson, C., & Shneiderman, B. (1992). Dynamic queries for information explo-
ration: An implementation and evaluation. In Proceedings of the ACM conference on human
factors in computing systems (CHI 1992), (pp. 619–626).
Aigner, W., Miksch, S., Schumann, H., & Tominski, C. (2011). Visualization of time-oriented
data. Berlin: Springer.
Andrienko, N., & Andrienko, G. (2013). Visual analytics of movement: An overview of methods,
tools and procedures. Information Visualization, 12(1), 3–24.
Andrienko, N., Andrienko, G., & Gatalsky, P. (2003). Exploratory spatio-temporal visualization:
an analytical review. Journal of Visual Languages and Computing, 14(6), 503–541.
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., & Wrobel, S. (2011). From movement tracks
through events to places: Extracting and characterizing significant places from mobility data. In
Proceedings of IEEE visual analytics science and technology (VAST 2011), (pp. 161–170).
Boyandin, I., Bertini, E., & Lalanne, D. (2010). Visualizing the world’s refugee data with
JFlowMap. Poster at Eurographics/IEEE symposium on visualization EuroVis 2010.
Boyandin, I., Bertini, E., Bak, P., & Lalanne, D. (2011). Flowstrates: An approach for visual
exploration of temporal origin–destination data. Computer Graphics Forum, 30(3), 971–980.
Chang, Q., Wood, J., Slingsby, A., Dykes, J., Kraak, M.-J., Blok C., & Ahas, R. (2013). Visual
analysis design to support research into movement and use of space in Tallinn: A case study.
Information Visualization. doi:10.1177/1473871613480062
Ersoy, O., Hurter, C., Paulovich, F., Cantareiro, G., & Telea, A. (2011). Skeleton-based edge bun-
dling for graph visualization. IEEE Transactions on Visualization and Computer Graphics,
17(12), 2364–2373.
Giannotti, F., Nanni, M., Pedreschi, D., Pinelli, F., Renso, C., Rinzivillo, S., et al. (2011).
Unveiling the complexity of human mobility by querying and mining massive trajectory data.
The VLDB Journal, 20(5), 695–719.
Guo, D. (2007). Visual analytics of spatial interaction patterns for pandemic decision support.
International Journal of Geographical Information Science, 21(8), 859–877.
Guo, D. (2009). Flow mapping and multivariate visualization of large spatial interaction data.
IEEE Transactions on Visualization and Computer Graphics, 15(6), 1041–1048.
Guo, D., & Gahegan, M. (2006). Spatial ordering and encoding for geographic data mining and
visualization. Journal of Intelligent Information Systems, 27, 243–266.
Güting, R. H., & Schneider, M. (2005). Moving objects databases. Burlington: Morgan
Kaufmann.
Hägerstrand, T. (1970). What about people in regional science? Papers of the Regional Science
Association, 24, 7–21.
Harrower, M., & Brewer, C. A. (2003). Colorbrewer.org: An online tool for selecting colour
schemes for maps. The Cartographic Journal, 40(1), 27–37.
Heer, J., Kong, N., & Agrawala, M. (2009). Sizing the horizon: The effects of chart size and
layering on the graphical perception of time series visualizations. In Proceedings of the ACM
conference on human factors in computing systems (CHI 2009) (pp. 1303–1312).
Hurter, C., Tissoires, B., & Conversy, S. (2009). FromDaDy: Spreading aircraft trajectories
across views to support iterative queries. IEEE Transactions on Visualization and Computer
Graphics, 15(6), 1017–1024.
Kapler, T., & Wright, W. (2005). GeoTime information visualization. Information Visualization,
4(2), 136–146.
Kincaid, R., & Lam, H. (2006). Line graph explorer: Scalable display of line graphs using
focus + cContext. In Proceedings of the international working conference on advanced vis-
ual interfaces (AVI 2006), May 2006 (pp. 404–411).
Kraak, M.-J. (2003). The space–time cube revisited from a geovisualization perspective. In
Proceedings of the 21st International Cartographic Conference, Durban, South Africa (pp.
1988–1995).
References 129

Okabe, A., Boots, B., Sugihara, K., & Chiu, S. N. (2000). Spatial tessellations—Concepts and
applications of Voronoi diagrams (2nd ed.). New York: Wiley.
Pelekis, N., Theodoridis, Y., Vosinakis, S., & Panayiotopoulos, T. (2006). Hermes—A framework
for location-based data management. In Advances in database technology (EDBT 2006),
Lecture notes in computer science (Vol. 3896, pp. 1130–1134), Berlin: Springer.
Phan, D., Xiao, L., Yeh, R., Hanrahan, P., & Winograd, T. (2005). Flow map layout. Proceedings
of the IEEE symposium on information visualization (InfoVis 2005), Minneapolis, Minnesota,
USA (pp. 219–224).
Saito, T., Miyamura, H.N., Yamamoto, M., Saito, H., Hoshiya, Y., & Kaseda, T. (2005). Two-
tone pseudo colouring: Compact visualization for one-dimensional data. In Proceedings of
the IEEE symposium on information visualization (InfoVis 2005), (pp. 173–180).
Shneiderman, B. (1994). Dynamic queries for visual information seeking. IEEE Software, 11(6),
70–77.
Slocum, T. A., McMaster, R. B., Kessler, F. C., & Howard, H. H. (2009). Thematic cartography
and geovisualization (3rd ed.). NJ: Pearson Prentice Hall.
Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region.
Economic Geography, 46(2), 234–240.
Tobler, W. (1981). A model of geographic movement. Geographical Analysis, 13(1), 1–20.
Tobler, W. (1987). Experiments in migration mapping by computer. The American Cartographer,
14(2), 155–163.
Vasiliev, I. R. (1997). Mapping time. Cartographica, 34(2), 1–51.
Verbeek, K., Buchin, K., & Speckmann, B. (2011). Flow map layout via spiral trees. IEEE
Transactions on Visualization and Computer Graphics, 17(12), 2536–2544.
Ware, C., Arsenault, R., Plumlee, M., & Wiley, D. (2006). Visualizing the underwater behaviour
of humpback whales. IEEE Computer Graphics and Applications, 26(4), 14–18.
Weaver, C., Fyfe, D., Robinson, A., Holdsworth, D., Peuquet, D., & MacEachren, A. M. (2007).
Visual exploration and analysis of historic hotel visits. Information Visualization, 6(1),
89–103.
Wood, J., Dykes, J., & Slingsby, A. (2010). Visualisation of origins, destinations and flows with
OD maps. The Cartographic Journal, 47(2), 117–129.
Wood, J., Slingsby, A., & Dykes, J. (2011). Visualizing the dynamics of London’s bicycle hire
scheme. Cartographica, 46(4), 239–251.
Chapter 5
Visual Analytics Focusing on Movers

Movers
Trajectories

Locations
Movement data Local time series
Spatial events

Spatial event data Spatial time series


Times
Spatial distributions

Fig. 5.1  This chapter addresses analysis tasks focusing on characteristics of movers and their
relations to the context. Characteristics of movers are represented by movement data in the form
of trajectories (cf. Fig. 3.13)

Abstract  In this chapter, we present visualization and analysis methods that can
support analysis tasks focusing on characteristics of movers and their relations to
the context (Fig. 5.1). These methods deal with movement data in the form of tra-
jectories of moving objects. For gaining an overview of a set of trajectories, a flow
map is built based on a territory tessellation reflecting the spatial distribution of
characteristic points of trajectories. The method for trajectory summarization can
be applied to a whole set of trajectories and to subsets, in particular to groups of
similar trajectories resulting from clustering. Density-based clustering algorithms
in combination with trajectory-specific distance functions (i.e. methods for assess-
ing the dissimilarity of trajectories) are more suitable for trajectories than parti-
tion-based clustering algorithms that take distances in a multidimensional space of
features (attribute values) as measures of object dissimilarity. We argue for the use
of a library of relatively simple distance functions addressing different properties
of trajectories. We describe several distance functions that are useful in analysing
trajectories and present an analytical procedure of progressive clustering, in which
cluster analysis is done in a sequence of steps. In each step, one distance func-
tion from a library is applied to the whole set of trajectories or to one or several

G. Andrienko et al., Visual Analytics of Movement, 131


DOI: 10.1007/978-3-642-37583-5_5, © Springer-Verlag Berlin Heidelberg 2013
132 5  Visual Analytics Focusing on Movers

clusters discovered earlier. In this way, the simple distance functions are combined
for enabling sophisticated analyses.
Visual exploration of positional attributes in groups (clusters) of spatially
similar trajectories is supported by a three-dimensional trajectory wall display.
Combinations of multiple positional attributes can be analysed using multi-attrib-
ute clustering of trajectory segments. For analysing relations among movers and
between movers and elements of the spatio-temporal context, we suggest several
methods and approaches. One method detects encounters of movers, when two
movers come close to each other. Another method builds a central trajectory of a
group of objects moving together and computes a set of positional attributes ena-
bling identification of relations among the movers in the group, such as leadership,
centrality, and relative spatial arrangement. An approach to analysing relations
between movers and static spatial objects, movers of another kind, and spatial
events is based on computing positional attributes expressing spatial and temporal
distances between trajectory positions and elements of the spatio-temporal context.

5.1 Characteristics

Trajectories characterize moving objects in terms of their spatial positions and,


possibly, values of dynamic thematic attributes in different time units. As noted in
Sect. 2.3, trajectories themselves are spatio-temporal objects (spatial events), which
have properties characterizing them as units: positions of the trajectories in space
and time, shapes, path lengths, etc. Trajectories as complex spatio-temporal objects
also have complex characteristics composed of the properties of their components,
that is, the spatial events the trajectories consist of (see Sect. 2.5). These include the
spatial position in each time unit and the values of positional thematic attributes,
such as speed, direction, acceleration, etc. Analysis of trajectories includes inves-
tigation of both the overall characteristics and the internal characteristics, that is,
variation in the positions and positional attributes over space and time.
We have already described or presented by examples the basic techniques sup-
porting visual exploration of trajectories. Representing trajectories as lines on a
map and in a space–time cube supports exploration of their overall characteristics.
A temporal bar chart shows the positions of trajectories in time. To show values of
positional attributes, trajectories can be represented by segmented bars in a tem-
poral bar chart and also by segmented bands on a map or in a space–time cube.
The attribute values are represented by colouring or shading of the segments of the
bars or bands. Interactive filtering, including filtering of trajectory segments, ena-
bles portion-wise exploration of a large set of trajectories and understanding what
is there in situations of visual clutter and occlusions, which are usually unavoid-
able since trajectories are typically not disjoint in space.
Here, we present several more sophisticated methods, which either significantly
rely on computational/algorithmic processing of trajectories or employ novel visu-
alization and interaction techniques.
5.1 Characteristics 133

5.1.1 Spatial Summarization of Trajectories

Whenever possible, data exploration and analysis should begin with getting an
overview of the data (Shneiderman 1996). For trajectories, we need an overview
map showing the distribution of the movement over space. As we have shown in
Chap. 1, representation of many trajectories by lines on a map may be ineffec-
tive since trajectories usually overlap and cross in space. Using semitransparent
drawing, as in Fig. 1.21 (right) and 3.6a, can reveal the topology of the underlying
space, such as the street network or the lanes of vessel movement. It also approx-
imately conveys the relative movement densities in different places. However, it
does not convey the number of trajectories and the movement directions.
Flow maps, as in Figs. 1.9 and 3.6d, can provide a good spatial overview of
multiple trajectories. Flow maps are based on discrete spatial aggregation of move-
ment data (Sect. 3.8), which uses a finite set of places and represents trajectories
as sequences of moves between places. To make a flow map adequately convey
the geography and topology of the movement, it is necessary to define appropriate
places. In Sect. 3.8, we have briefly introduced our method for territory tessellation
according to the spatial distribution of points, in particular characteristic points of tra-
jectories (Andrienko and Andrienko 2011). The method extracts characteristic points
from the trajectories, groups the extracted points by spatial proximity, finds the cen-
tres of the groups and uses them as generating points (seeds) for Voronoi tessellation
(Okabe et al. 2000), and then uses the resulting Voronoi cells as places for spatial or
spatio-temporal aggregation of the trajectories. Here, we shall present the method in
more detail, including the algorithms used for the extraction of characteristic points
of trajectories and spatial clustering of the points. We shall also demonstrate how the
level of abstraction can be regulated through the parameters of the method.
Characteristic points of trajectories include their start and end points, the points
of significant turns and the points of significant stops (pauses in the movement). If
a trajectory has long straight segments, it is also necessary to take representative
points from these segments. Otherwise, straight segments will not be taken into
account in choosing seeds for Voronoi cells and, as a result, may be inadequately
represented by flows (i.e. the flows representing these segments may deviate too
much from the directions of the segments). We use the following algorithm to
extract characteristic points from a trajectory.
134 5  Visual Analytics Focusing on Movers
5.1 Characteristics 135

The computational complexity of Algorithm 5.1 is linear with respect to the num-
ber of points in a trajectory. The upper limit of the computation time for a trajectory
with N points is proportional to M * N, where M is the maximum number of con-
secutive trajectory points fitting in a circle with the diameter MinDistance.
The next step after extracting the characteristic points from all trajectories is to
group the points in space so that the spatial extents of the groups approximate the
desired sizes of the space compartments (places) to be later used for the aggrega-
tion. This is done using Algorithm 5.2. Note that the algorithm is applicable to
arbitrary points and not only to characteristic points of trajectories. Hence, it can
be used not only for summarization of trajectories but also for other purposes. For
the sake of efficiency, the algorithm uses a spatial index in the form of a regular
grid with square cells covering the bounding rectangle of the set of points (see
statement 2 of the algorithm description). The side lengths of the grid cells are
equal to the desired spatial size of the space compartments to be obtained in the
result. As point groups are built, their centroids (average points) are put in the grid
cells according to their coordinates.
136 5  Visual Analytics Focusing on Movers

The computational complexity of this algorithm is linear with respect to the num-
ber of points. To place a point in the right group, it is necessary to compute its dis-
tances to the group centroids located in at most nine cells of the grid: the cell in which
the point coordinates fit and the eight neighbouring cells around it (see procedure
get_closest_centroid). If K is the maximum number of centroids fitting in a grid cell,
distances to at most K * 9 centroids need to be computed. Since the sizes of the grid
cells are determined by the value MaxRadius, which is also the maximum radius of a
group, a cell may contain at most four group centroids (this may happen in a particu-
lar case when the coordinates of the centroids coincide with the corners of the cell).
To decrease the sensitivity of the results to the order in which the points are
processed and to improve the correspondence between the generated groups of
5.1 Characteristics 137

points and the “natural” clusters, that is, dense concentrations of points, we have
devised a method that optimizes the groups generated by Algorithm 5.2. The idea
is to regroup the points around the centres of dense regions.

The estimation of the point density in a group (step 1 of Algorithm 5.3)


requires explanation. Point density could be computed as the number of points
divided by the spatial extent of the group, which could be approximated by the
area of the circumferential circle or bounding rectangle. However, if a group con-
sists of a compact dense cloud of points plus one or a few outliers located far from
138 5  Visual Analytics Focusing on Movers

this cloud, the computed density may be rather low due to the large size of the
enclosing shape. Therefore, we estimate the point density in a different way, which
is based on the following reasoning. If a group contains a dense cloud comprising
the bulk of the points, the point medXY, whose x- and y-coordinates are the medi-
ans of the x- and y-coordinates of the group members, is likely to be located inside
this cloud. The mean distance of the points to medXY can be taken as an estimate
of the size of the dense cloud. Hence, to estimate the density, we divide the count
of points in the group by the squared mean distance to medXY. This allows us to
give proper attention to groups where points are densely concentrated irrespective
of occasional outliers. In step 5, we find the group member that is the closest to
medXY and use it as a seed for a new group, which will replace the current group.
The computational complexity of the optimization phase (Algorithm 5.3) is the
same as for Algorithm 5.2. In fact, this is a re-application of Algorithm 5.2 after
some preparatory operations (steps 1–5), which do not depend on the number of
points but only on the number of groups. In our experiments, we found that the
optimization phase takes less time than the initial grouping. The reason is that the
points in the optimization phase do not come in a random order but are taken from
existing groups, and there are fewer groups to check in order to find an appropriate
group for each point.
The centroids of the obtained point groups are used as the generating seeds for
dividing the territory into Voronoi polygons, or Voronoi cells. We also introduce
additional seeds around the boundaries of the territory and in the areas where there
are no characteristic points from the trajectories. This allows us to obtain cells of
more even sizes and shapes. The additional seeds are distributed over the territory
in a regular manner. A new seed is added only if it is sufficiently far from all group
centroids, which means that the distance is more than the doubled MaxRadius.
The use of additional seeds is not absolutely necessary, but it can improve the
appearance of the resulting maps, especially in cases when the trajectories do not
cover the whole territory. Figure 5.2 demonstrates the impact of the use of addi-
tional seeds on the tessellation.
As we have mentioned, the abstraction level of the overview map can be regu-
lated through the parameters of the territory division method. The key parameter
is MaxRadius in Algorithm 5.2 (point grouping), which determines the spatial
extents of the point groups and, hence, the sizes of the cells. The larger the cells
are, the higher is the degree of spatial generalization and abstraction. The param-
eters in Algorithm 5.1 determine what points will be selected from the trajecto-
ries as characteristic points. They have no impact on the abstraction level but are
responsible for the geographical and geometrical correspondence of the general-
ized representation to the original data.
Figure  5.3 demonstrates how the same set of Milan car trajectories from
one day (Wednesday) can be spatially summarized at different abstraction lev-
els. We have built four spatial summaries based on the same set of characteris-
tic points extracted by Algorithm 5.1 using the following parameter settings:
MinAngle  = 30°; MinStopDuration = 300 s (5 min); MinDistance = 100 m;
MaxDistance  = 500 m. The four different tessellations have been created using
5.1 Characteristics 139

Fig. 5.2  The impact of using additional seeds for territory tessellation. Left the tessellation
(boundary lines in violet) is built using only the centroids of point clusters (orange dots). Centre
the tessellation (boundary lines in green) is built using the same centroids and additional regu-
larly arranged seeds in the areas where there are no centroids. Right the two tessellations are
overlaid on the same map for comparison

Algorithm 5.2 with the following values of the parameter MaxRadius: 1,000 m
(top left), 1,500 m (top right), 3,000 m (bottom left), and 5,000 m (bottom right).
The flow symbols are drawn based on the positions of the cell seeds. They are ori-
ented along the lines connecting the seeds but are shorter than the lines, to reduce
symbol overlap. To make the maps more readable, minor flows have been hid-
den. Specifically, the upper two maps show only the flows with magnitudes 100 or
more, and the maps at the bottom left and right show only the flows with minimal
magnitudes 120 and 150, respectively.
The map corresponding to the smallest radius (top left) is the most detailed,
and the map corresponding to the largest radius (bottom right) is the most abstract
and schematic. The first three maps convey very well the geometry of Milan’s belt
motorways, where the car traffic is the most intensive. This is because many char-
acteristic points lie on the motorways and thus create dense point clusters. When
these points are grouped together with relatively few points lying on other streets
(this happens when the allowed group radius is not very large), the centroids of the
point groups also lie on the motorways. As a result, the flow symbols are oriented
similarly to the underlying motorway segments. In the maps built using small
group radii, the flow symbols also follow the shapes of smaller streets. Thus, in the
upper left map, we can see flows along the radial streets and along the major circu-
lar streets in the city centre. The shapes and positions of the main radial streets are
mostly preserved also on the top right, but the circular topology of the centre is not
evident any more. The larger the point group radius is, the more points from dif-
ferent streets are grouped together, and consequently, the higher are the deviations
of the group centroids from the main traffic thoroughfares. Still, even the most
140 5  Visual Analytics Focusing on Movers

Fig. 5.3  Spatial summaries of the Milan car trajectories at different levels of abstraction

schematic map on the bottom right gives a useful summary of the traffic in Milan.
It shows intensive flows on the belt motorways around the city and more intensive
traffic on the east of the inner city than on the west.
The amount of distortion resulting from the generalization can be measured and
controlled. The paper by Andrienko and Andrienko (2011) introduces local and
global numeric measures of the quality of the generalization and techniques sup-
porting local adjustments of the quality and abstraction level in selected parts of
the territory where this is deemed necessary by the user.
5.1 Characteristics 141

The method for spatial summarization of trajectories presented in this section


can be applied not only to all trajectories together but also to clusters of similar
trajectories, that is, separately to each cluster. When trajectory clustering is used
for exploration and analysis of a set of trajectories, a big problem is how to visu-
alize clusters of trajectories so as to have an overview of all clusters and to be
able to compare different clusters. This is a problem because trajectories are not
disjoint in space. They intersect and partly overlap and so do the clusters. Hence,
there is no way to show all clusters in one map in a comprehensible way. A suit-
able approach is to generate multiple small maps each presenting a single clus-
ter. Since the maps have to be small, the clusters need to be shown in a highly
generalized manner, which can be achieved using the summarization method just
described.

5.1.2 Clustering of Trajectories

Clustering is a generic technique used to explore and analyse various kinds of data
(Kaufman and Rousseeuw 1990), in particular geographical data (Han et al. 2009).
Clustering enables the discovery and interpretation of groups of objects having
similar properties and/or behaviours. Spatial clustering builds groups (clusters)
from objects that are spatially close and/or have similar spatial properties (shapes,
spatial relations among components, etc.).
There are three major classes of clustering methods: partition-based, hierar-
chical-based, and density-based. The partition-based methods aim at dividing the
dataset into partitions, such that the objective function for each partition is maxi-
mized. They often start with an arbitrary partitioning and then refine the partitions
in an iterative way until the result converges to a stable solution. This approach is
used, in particular, in the popular k-means method (Ng and Han 1994): given an
input parameter k, the method chooses k random objects from the datasets as clus-
ter seeds and assigns all the other objects to the nearest seed. Then, the algorithm
refines the clusters by moving objects from one cluster to another until a stable
configuration is reached. The self-organizing map algorithm (Kohonen 2001), or
SOM, works using a similar principle. First, a two-dimensional matrix of proto-
type vectors is built either randomly or by applying a principal component analy-
sis to the data (at the end, each vector will represent a cluster). Then, for each
object, the closest prototype vector is found. The vector and its neighbours in the
matrix are adjusted to this object using the technique of neural network training.
This operation is done iteratively, with the duration of the training specified as a
parameter. The partition-based methods produce convex clusters as a result of add-
ing objects to the nearest clusters.
The hierarchical approaches work through an iterative hierarchical decompo-
sition of the dataset, represented by means of a dendrogram. This is a tree-like
representation of the dataset where each internal node represents a cluster and the
leaves contain the single object of the dataset. A dendrogram may be created either
142 5  Visual Analytics Focusing on Movers

from the bottom (objects are grouped together) or from the top (the dataset is split
into parts until a termination condition is satisfied).
The density-based clustering methods rely on the concept of density in iden-
tifying a cluster: a point is inside a cluster if its neighbourhood of a given radius
contains at least a given minimum number of points, that is, the density of the
cluster has to be not less than the density threshold. Density-based algorithms are
naturally robust to such problems as noise and outliers since these problems usu-
ally do not affect the overall density of the data. The most popular density-based
clustering algorithm is DBSCAN (Ester et al. 1996). In this book, we use one of
its modifications called OPTICS (Ankerst et al. 1999).

5.1.2.1 Partition-Based Clustering According to Trajectory Features

The generic clustering algorithms typically assume that the objects subject to clus-
tering are represented by vectors (points) in a multidimensional space of features,
that is, attributes. The Euclidean or, more generally, Minkowski distance between
two vectors is typically taken as the measure of the dissimilarity between the
objects. For trajectories, various thematic attributes can be used as features in clus-
tering, including attributes derived computationally from the sequences of position
records, as described in Sect. 3.4 .
An example of clustering of trajectories based on their features (attributes)
using the k-means method is shown in Fig. 5.4. In order to identify clusters of spa-
tially similar trajectories, we have used the following attributes: the x- and y-coor-
dinates of the start point, the end point, and the point in the middle of the path, the
mean x- and y-coordinates, and the distances between the start and end points in
the x- and y-dimensions (10 attributes in total). A difficult problem in using the
k-means clustering is the choice of the value for the parameter k, that is, the num-
ber of clusters, since it usually is not known in advance. A suitable strategy is to
try several values and choose the value that gives the best result in terms of clus-
ter interpretability. Figure 5.4 shows the result of k-means clustering for k = 25.
The clusters are represented in small multiple maps in a summarized form, that is,
by flow maps, which have been built using the spatial summarization method pre-
sented in the previous section. The maps are ordered according to the sizes of the
clusters, that is, the numbers of the trajectories in them.
The first noticeable difference between the clusters is the parts of the city in
which they occur. There are clusters located in the south-west, in the north-east,
in the centre, etc. Another noticeable difference is the movement directions. For
almost any cluster, there is a corresponding cluster covering approximately
the same area but having the opposite movement directions. The most obvious
matches are 11 and 14, 4 and 8, 1 and 3, and 5 and 6 (the cluster labels are shown
above the small maps). These clusters, where certain flow directions prevail over
others, are easy to interpret. They tell us about frequently followed paths, which
mostly go along the belt motorways. It is also possible to figure out that cluster 21
is opposite to 16, 13 to 15, 18 to 2, 17 to 10, and 9 to 22. These clusters are also
5.1 Characteristics 143

Fig. 5.4  Clustering of the Milan trajectories based on their features (attributes) using the
k-means clustering method; k = 25

relatively easy to interpret owing to the prevalence of certain flow directions over
others. The clusters with mixed flow directions, like 24, 25, 19, 7, 20, 23, and 12,
are difficult to interpret in terms of the followed routes, but they at least tell us
how many trips were made in different parts of the city.
A possible reason for some of the clusters being hard to interpret may be that
the chosen number of clusters (25) is insufficient for good separation between
144 5  Visual Analytics Focusing on Movers

trajectories based on their spatial properties. To check whether increasing the


number of clusters may improve the results in terms of interpretability, we subdi-
vide the largest cluster (cluster 24, in the upper left corner of Fig. 5.4) by apply-
ing the k-means algorithm only to its members. We tried different subdivisions by
setting the parameter k to 2, 3, 4, and 5; the result for k = 4 is shown in Fig. 5.5.
However, none of the attempts gave us good results in terms of interpretability,
that is, understanding the properties of trajectories in each cluster and the differ-
ences between the clusters.
In fact, there are two reasons why we are not satisfied with the results of the
k-means clustering. First, many trajectories in a given set of trajectories can be
dissimilar to all others. According to common-sense logic, such trajectories should
not be included in any cluster. However, partition-based methods, like k-means,
put each object in some cluster. Being limited in the number of clusters, the algo-
rithm may put an object together with other objects that are not very similar to it.
Hence, the variability among objects within a cluster may be high. If our goal is to
discover groups of similar trajectories and disregard those that are dissimilar to all
others, we need to apply density-based clustering methods, which label dissimilar
objects as “noise” and do not include them in any cluster.
The second reason for our dissatisfaction with the results of k-means may be
that the selected attributes do not fully correspond to our notion of similar trajec-
tories. We would like to have groups of trajectories following similar routes, but
the chosen attributes do not adequately represent the routes. The other attributes
that we can compute also do not capture the routes well enough. Hence, we need
to measure the dissimilarity between trajectories in a special way, which will be
introduced later on.
It should not be concluded that partition-based clustering of trajectories accord-
ing to their features is absolutely useless. The usefulness depends on the properties
of the data and the analysis target. For example, in analysing a set of trajectories
that follow the same route, it may be quite reasonable to group them based on their
numeric attributes such as duration, number of stops, and average speed. When
speed characteristics are the target of the analysis, the trajectories can be clus-
tered based on their speed statistics (minimum, maximum, mean, quartiles, etc.).

Fig. 5.5  Cluster 24 from the previous clustering (Fig. 5.4) has been refined by subdividing into
four smaller clusters
5.1 Characteristics 145

Furthermore, if the spatial and geometrical properties of trajectories can be ade-


quately represented by equal size samples of trajectory points, the sequences of
coordinates of these points can be used as feature vectors in partition-based clus-
tering. Thus, Schreck et al. (2009) cluster such feature vectors by means of a self-
organizing map (Kohonen 2001).
In clustering trajectories (like any other objects) based on values of their attrib-
utes, it should be remembered that dissimilarity between objects is measured as
the distance between vectors of attribute values (features) in the feature space, that
is, the multidimensional space of all possible value combinations. In this regard,
special attention needs to be paid to the ranges and distributions of the values of
the attributes. Very often different attributes have very different value ranges. If no
transformation is applied to them, the distances in the feature space will be more
affected by attributes with larger value ranges, while attributes with small ranges
will have no effect at all. Therefore, different attributes are usually “standardised”
by transforming the original values into relative positions between the minimum
and maximum. Very often this is done using a simple linear transformation, which
is only good when the distribution of the attribute values within the value range is
close to uniform and there are no outliers. For attributes with other value distribu-
tions, it may be necessary to apply specific transformations. In particular, some
attributes in movement data, such as trajectory length and duration, often have val-
ues distributed according to the so-called power law (González et al. 2008), with
large number of small values and small number of high values, some of which
may be very far from the bulk of the values. A linear transformation of values of
such attribute into relative positions between the minimum and maximum would
make most of the values close to zero. In this case, logarithmic transformation is
more appropriate. Thus, before using clustering, it is necessary to look at the value
distribution of each attribute (e.g. on a frequency histogram) and apply a suitable
data transformation if the distribution is far from uniform.

5.1.2.2 Density-Based Clustering Using Special Distance Functions

As described by Rinzivillo et al. (2008), a density-based clustering algorithm (e.g.


OPTICS) can be implemented in such a way that the process of finding clusters
is separated from the process of assessing the dissimilarity between objects. The
dissimilarity is assessed by an external algorithm, called distance function, which
can be tailored to the specifics of a given data type and to the analysis goals. The
measure of object dissimilarity is commonly called distance.
For example, Fig. 5.6 shows the 15 largest clusters obtained by means of the
density-based clustering algorithm OPTICS using a specific distance function that
assesses the dissimilarity of trajectories in terms of the followed routes. This dis-
tance function is called “route similarity”; the algorithm description will follow.
The clustering algorithm has two parameters: the distance threshold D and the
minimal number of neighbours N. These parameters have the following roles. Two
objects are regarded as neighbours if the distance between them (assessed by the
146 5  Visual Analytics Focusing on Movers

distance function) is not more than D. An object having at least N neighbours is


considered as a core object of a cluster.
Using these parameters, the clustering is done as follows. The algorithm finds
an object having at least N neighbours in its D-neighbourhood and being not yet
included in any cluster. This object is taken as a seed for a new cluster. All its
neighbours are included in this cluster, then all neighbours of the neighbours, and
so on. The process of expanding the cluster terminates when there are no more
neighbours of the cluster members that are not yet in the cluster. The algorithm
then tries to find another object that is not yet in any cluster and has at least N
neighbours. When no such objects left, the algorithm terminates.
The example in Fig. 5.6 has been obtained using the parameters D = 800 m and
N = 5. The distance (dissimilarity) between trajectories is measured in metres and
can be approximately interpreted as the average spatial distance between the tra-
jectories. The clustering of 8,206 trajectories resulted in 60 clusters, which include
in total 1,744 trajectories. 6,462 trajectories were labelled as “noise”, which means
that none of them had at least five neighbours to form a cluster. The largest clus-
ters, which are shown in Fig. 5.6, are the most interesting since they represent the
frequently followed routes. All clusters are easy to interpret. We can see that the
most frequent routes use the belt motorways going around the city.
The algorithm of the distance function “route similarity” is described below.
The original version of the algorithm was published by Andrienko et al. (2007).

Fig. 5.6  Fifteen largest clusters of Milan trajectories discovered by density-based clustering


with the distance function “route similarity”
5.1 Characteristics 147

Since then, the algorithm has been slightly modified, based on our increas-
ing experience with various examples of movement data. In brief, the algorithm
repeatedly searches for the next pair of closest positions from two trajectories and
computes the mean distance between the positions of this pair plus a penalty for
unmatched positions, that is, positions that have been skipped as insufficiently
close to positions in the other trajectory. The penalty is computed as the sum of
the deviations of the unmatched points from the matching parts of the trajectories
normalized by the length of the matching parts (the length of the common route).
148 5  Visual Analytics Focusing on Movers

5.1.2.3 Progressive Clustering
Similarity between trajectories is not limited to the similarity of the routes.
Trajectories are complex spatio-temporal objects with heterogeneous properties,
including the geometric shape of the path, its positions in space and in time, and
the dynamics of changes in the spatial location, speed, direction, and other move-
ment attributes over time. Rinzivillo et al. (2008) suggest that a library of distance
functions should be used for trajectories such that each function addresses a par-
ticular property. Creating a single distance function that would account for all
properties is very difficult and, moreover, not reasonable. On the one hand, not all
properties may be simultaneously relevant in practical analysis tasks. On the other
hand, clusters obtained by means of a universal function covering all properties
would be very difficult to interpret. A more reasonable approach is to give the ana-
lyst a set of relatively simple and easily understandable distance functions dealing
with different properties of trajectories.
Given a library of distance functions, cluster analysis of a set of trajectories can
be done in a sequence of steps. In each step, clustering with a single distance func-
tion is applied either to the whole set of trajectories or to one or more of the clus-
ters obtained in the preceding steps. The clusters obtained in each step are easy
to interpret by tracking the history of their derivation. Step by step, the analyst
progressively refines his/her understanding of the data. New analytical questions
arise as an outcome of the previous analysis and determine the further steps. The
whole process is called “progressive clustering”. It needs to be supported by vis-
ual and interactive tools so that the analyst can conveniently view the clustering
results and select data subsets for further analysis.
A good property of progressive clustering is that a simple distance function
with a clear meaning is applied on each step, which leads to easily interpretable
outcomes. Despite the simplicity of each distance function taken separately, suc-
cessive application of several different functions enables sophisticated analyses
through gradual refinement of earlier obtained results. Besides the advantages for
the interpretation, progressive clustering provides a convenient mechanism for
user control over the work of the computational tools as the user can selectively
direct the computational power to potentially interesting portions of data instead
of processing all data in a uniform way. In particular, the analyst may use “expen-
sive” (in terms of required computer resources) distance functions for relatively
small potentially interesting subsets obtained by means of “cheap” functions,
which need little time to produce results.
To give an example of progressive clustering, we first group the Milan trajec-
tories based on the spatial proximity of their end points. This means that we use
5.1 Characteristics 149

the distance function which computes the spatial distance between the end points
as the measure of the dissimilarity between the trajectories. Hence, two trajecto-
ries are considered to be similar if they have close destinations. Using the den-
sity-based clustering parameters D = 500 m (distance threshold) and N = 20
(minimal number of neighbours), we obtain 35 clusters with sizes ranging from 21
to 1,362; 1,904 trajectories remain beyond the clusters and are labelled as “noise”.
Figure 5.7 shows graphical summaries of the 10 largest clusters.
For groups of trajectories with close ends, another way of summarization may
be appropriate. Figure 5.8 demonstrates an alternative summarization of the same
clusters as in Fig. 5.7. The difference is that the summarization algorithm (Sect.

Fig. 5.7  Ten largest clusters of the Milan trajectories based on the proximity of their end points

Fig. 5.8  The same clusters as in Fig. 5.7 are summarized using only the start and end points of
the trajectories
150 5  Visual Analytics Focusing on Movers

5.1.1) has been applied to the trajectories represented only by their start and end
points, disregarding all other points. The resulting images of the clusters are more
expressive in conveying movements to common destinations than the images in
Fig. 5.7. The largest cluster (cluster 4, in yellow) consists of trajectories ending in
the central part of the city. The next five largest clusters consist mostly of trajecto-
ries going out of the city in different directions.
On the next stage of our exploration, we select the largest cluster (cluster 4)
with 1,362 trajectories ending in the centre and apply the clustering algorithm
with the distance function that measures the dissimilarity in terms of the distances
between the starting points of the trajectories. With the parameters D = 500 m and
N = 5 neighbours, we obtain 28 subclusters of cluster 4 with sizes ranging from
5 to 127 and 461 trajectories labelled as “noise”. The 10 largest subclusters are
shown in Fig. 5.9. Again, the clusters have been summarized based only on the
start and end points of the trajectories. The images expressively convey the ori-
gins and destinations of the trajectories in the clusters. Clusters 3 (blue), 15 (dark
magenta), 5 (light magenta), and 6 (orange) tell us that the most frequent origins
of the trajectories ending in the centre are on the north-west and north-east of the
city, at different exits of the belt motorways. Most of the other clusters also origi-
nate at motorway exits. There are relatively few trajectories that start and end in
the centre (cluster 7, the second image in the lower row in Fig. 5.9).
To see what routes are taken by the cars coming into the city centre, we apply
clustering with the “route similarity” distance function to the same subset of tra-
jectories ending in the centre (cluster 4). With D = 1,000 m and N = 3 neigh-
bours, we obtain 21 clusters with sizes from 3 to 27; 995 trajectories are labelled
as “noise”. The 10 largest subclusters are shown in Fig. 5.10. This time, since
we are interested in the routes, we have applied the summarization to the whole
trajectories rather than only the starts and ends. The result of the clustering, in

Fig. 5.9  Ten largest subclusters of cluster 4 based on the proximity of the start points
5.1 Characteristics 151

Fig. 5.10  Ten largest subclusters of cluster 4 based on the route similarity

particular the high proportion of the “noise” and small sizes of the clusters, tells
us that there is high diversity of the routes used for coming into the city. The most
frequent routes use the radial motorway going to the centre from the north-west.
It is interesting that some drivers coming from the north-east also use a part
of this motorway: in cluster 8 (greenish yellow, the last image in the upper row),
the trajectories starting on the north-east go first westwards along the northern
motorway to the intersection with the radial motorway and then make a sharp turn
towards the centre. In cluster 3 (dark blue, the third image in the upper row), the
trajectories also start on the north-east but use the nearest radial street for going
to the centre. To understand why some drivers take the direct route (cluster 3) and
others go around (cluster 8), we take a closer look at these clusters using other
exploratory tools.
On the left of Fig. 5.11, there is a map showing the trajectories of these two
clusters without aggregation. We see that the two groups of trajectories mostly end
in different parts of the city centre, although a few intersections occur. The trajec-
tories of cluster 8 go to the western part of the centre, which justifies the use of the
western radial road. Still, the drivers could use other radial streets. Evidently, they
expect that they can move with higher speeds by using the motorways. On the right
of Fig. 5.11, the average speeds of the trajectories in the two clusters are compared
by means of a scatterplot where the horizontal dimension represents the time of the
day when the trajectories started and the vertical dimension represents the average
speeds. The trajectories of the selected clusters are represented by coloured dots
while the grey dots represent all other trajectories, which are currently filtered out
(inactive). The scatterplot shows us that the average speeds that are reached in clus-
ter 8 can, indeed, be higher than in cluster 3, but only in the early morning (before
5:30) and in the evening (after 17:00). In the remaining time, the average speeds in
the two clusters remain in about the same quite low range. Hence, during the day,
the use of the motorways does not give any benefit in terms of the speed.
152 5  Visual Analytics Focusing on Movers

Fig. 5.11  Comparison between two “route similarity” clusters of trajectories ending in the city
centre. The map (left) shows the routes, and the scatterplot (right) shows the average speeds (ver-
tical dimension) against the start times of the trajectories (horizontal dimension)

This example investigation of two selected clusters demonstrates that the analy-
sis procedure called “progressive clustering” is not limited to mere applications of
the clustering tools to various subsets of the data but also includes investigation
of the clustering results by means of all kinds of appropriate tools and techniques.
The goal of progressive clustering is not to make clusters per se but to gain under-
standing of the data.

5.1.2.4 Variety of Distance Functions for Trajectories

In our examples, we have used the distance function “route similarity” and two
very simple distance functions that just compute the distances between the starts
or between the ends of the trajectories. Technically, we have a single distance
function which computes four measures: distance between the starts, distance
between the length, difference between the path lengths, and difference between
the durations. In using the function, we select the measures that need to be com-
puted. Any combination of the measures may be chosen. If two or more measures
are selected, the function combines them in a single measure.
The distances between the starts/ends and the difference between the path
lengths are all spatial distances, which are comparable and therefore combinable.
However, the difference between the durations is a temporal distance, which dif-
fers by nature from the spatial distances and therefore cannot be directly combined
with them. To make it combinable, we transform the temporal distance into an
equivalent distance in space. For this purpose, we ask the user to specify a thresh-
old T for the temporal distance that will have the same effect as the spatial dis-
tance threshold D. Then, the difference between the durations is transformed into
an equivalent spatial distance by multiplying by the ratio D/T. This approach is
also used in other distance functions that combine spatial and temporal distances.
5.1 Characteristics 153

The combination works as follows. First, the distances between the starts and
between the ends are combined by taking their average. Then, the result is com-
bined with the other measures using the formula of Euclidean distance. If only one
or none of the distances between the starts and the ends is chosen, all measures are
combined through the formula of Euclidean distance.
The function “route similarity” is a quite complex and computationally expen-
sive function for assessing the similarity of the paths. It has been designed to
tolerate incomplete trajectories (i.e. where some parts in the beginning and/or at
the end are missing), significant positioning errors, and unequal time intervals
between records. When the data quality is good, it may be sufficient to use a sim-
pler and cheaper approach: to take the starting and ending points of two trajecto-
ries plus several intermediate checkpoints from each trajectory and compute the
average from the spatial distances between the corresponding points. There are
different ways for selecting the intermediate points:
• k points, where k is a user-specified number, are chosen so as to keep the num-
ber of intermediate points between them constant;
• k points are chosen so as to keep the path lengths between them constant;
• given a time step Δt, the points are selected so that the time intervals between
them are close to Δt;
• given a distance step Δd, the points are selected so that the path lengths
between them are close to Δd.
Our library of distance functions includes functions for all these variants of
checkpoint selection. The selection based on a time step addresses not only the
similarity of the followed paths but also the similarity of the dynamics: the dis-
tance between trajectories will be small only if close points are reached at close
relative times with respect to the start times of the trajectories.
The distance function “route similarity + dynamics” extends the function
“route similarity” by taking into account the movement dynamics. Like “route
similarity” and unlike the functions checking the starts, ends, and selected inter-
mediate points, it can deal with incomplete trajectories and with trajectories whose
starts and/or ends diverge, while the remaining parts of the paths are similar. The
extension works as follows. Along with the average spatial distance between the
matching points of two trajectories, the function computes the average differ-
ence in the relative times of reaching the next matching points with respect to
the relative times of reaching the previous matching points. Let P and Q be two
trajectories, Pi and Qj a pair of matching points from these trajectories, and Pk
and Ql the next pair of matching points selected by the Algorithm 5.4 (Sect.
5.1.2.2). Let Δt(Pi, Pk) be the temporal distance between Pi and Pk: Δt(Pi, Pk)
= Pk. time − Pi. time. Likewise, Δt(Qj,Ql) = Ql. time − Qj. time. The relative time
difference in reaching Pk and Ql is computed as abs(Δt(Pi, Pk) − Δt(Qj, Ql)). This
time difference is accumulated as the algorithm scans the trajectories and then is
divided by the number of matching points, thus giving the average temporal dis-
tance. The latter is transformed into an equivalent spatial distance by multiplying
by the ratio D/T, where D is the spatial distance threshold and T is the temporal
154 5  Visual Analytics Focusing on Movers

distance threshold. This is the same approach as is used for the transformation of
the differences in the trajectory durations. Then, the transformed average tempo-
ral distance is combined with the average spatial distance using the formula of
Euclidean distance.
The work of the distance function “route similarity + dynamics” is dem-
onstrated in Fig. 5.12. We have first clustered the Milan trajectories using the
distance function “route similarity” and parameters D = 800 m and N = 5 neigh-
bours. The results of this clustering have been shown in Fig. 5.6 (Sect. 5.1.2.2).
According to the idea of progressive clustering, we have selected cluster 16 (the
third image in the upper row in Fig. 5.6) with 118 trajectories going from the
north-west of the city to the south-east along the belt motorway on the west and
south. The trajectories are shown on a map in Fig. 5.12 (left). We have applied the
density-based clustering with the distance function “route similarity + dynamics”
and parameters D = 800 m, N = 3 neighbours, and T = 60 s to the selected trajec-
tories. The clustering algorithm has found only one dense cluster with 59 trajecto-
ries and has labelled the remaining 59 trajectories as “noise”.
On the right of Fig. 5.12, the trajectories are compared in a space–time cube.
The trajectories belonging to the cluster are coloured in red and the “noise” in
grey. To enable the comparison, the trajectories have been aligned in time to a
common start time (see Sect. 3.3). We can see that the red lines representing the
trajectories of the cluster are parallel and make a tight bundle in the transformed
time. The slopes of the red lines are nearly constant along their lengths. These
observations mean that the trajectories had nearly equal speeds, and moreover,
the speeds did not vary much along the route. The grey lines, which represent the
“noise”, are quite widely spread in the transformed temporal dimension, and their
slopes significantly vary within and between the trajectories. Since steep segments

Fig. 5.12  Clustering of a subset of trajectories shown on the left using the distance function
“route similarity + dynamics”. In the space–time cube on the right, the trajectories belonging to
the discovered single cluster are shown in red and the “noise” in grey. The time references in the
trajectories have been aligned to a common start time
5.1 Characteristics 155

indicate slow movement and vertical segments indicate stops, we can conclude
that in the trajectories labelled as “noise”, the movement was impeded, most prob-
ably, by unfavourable traffic conditions.
Figure  5.13 (left) presents a space–time cube where the original time refer-
ences, that is, specific times of the day, have been restored. We see that during the
day, there were alternating time intervals when either the red or the grey trajecto-
ries prevailed. The scatterplot on the right shows the durations of the trajectories
against the times of their starts. The dots representing the trajectories are coloured
according to the cluster membership. We see that there was a long time interval in
the morning (from about 4:30 till about 10:00) when there were no red trajectories,
and the durations of the trajectories were much longer than in the other times of
the day. Hence, in this interval, the cars could not move fast enough. A smaller
interval when mostly grey trajectories occurred (with one exception) is between
15:20 and 17:00. The durations of the trajectories were not as long as in the even-
ing, but still the cars were not able to move uniformly.
The application of the clustering with the distance function “route similar-
ity + dynamics” to the other clusters by route similarity gives us similar results:
each time we obtain one cluster of trajectories with uniform movement and the
“noise” including various deviations from the uniform movement. Due to the vari-
ety of the deviations, the trajectories with non-uniform movement do not allow
grouping into clusters by dynamics.
The function “route similarity” can also be extended in another way. To
find groups of objects moving together, the function can check whether the
matching trajectory points are reached at about the same absolute times. The
extension is similar to the “route similarity + dynamics”, but instead of the dif-
ferences between the relative times, the differences between the absolute times are
computed.

Fig. 5.13  Left the results of the clustering by “route similarity + dynamics” are shown in a


space–time cube according to their original time references. Right the scatterplot of the trajectory
durations (vertical dimension) against their start times (horizontal dimension) shows how the tra-
jectories and their durations are distributed in time over the day
156 5  Visual Analytics Focusing on Movers

Besides the distance functions described here, a variety of other distance func-
tions have been proposed for trajectories, including the basic Euclidean distance
(assuming that trajectories are represented by vectors of fixed length), spatial
Euclidean distance average along time (Nanni and Pedreschi 2006), direction-
oriented distances (Vlachos et al. 2002; Pelekis et al. 2012), and adaptations
of the distance functions originally developed for time series analysis, such as
dynamic time warping (Berndt and Clifford 1994).

5.1.2.5 Clustering of Very Large Sets of Trajectories

Most of the existing implementations of clustering algorithms can work only with
objects loaded in computer RAM, which is a serious limitation in terms of the size
of the data that can be analysed. Out-of-memory implementations are technically
possible but extremely time-consuming, especially when it is necessary to com-
pare complex objects, such as trajectories, using specialized distance functions.
This might not be a very big problem if the resulting clusters were exactly what
the analyst needs, but this is usually not the case. All clustering techniques involve
parameters, and different parameter settings lead to diverse results, which may
be more or less meaningful to a human or may provide different complementary
meanings. Hence, the analyst needs to run clustering several times, or even many
times, with different settings, which requires the reaction time to be short.
The following approach suggested by Andrienko et al. (2009) enables interac-
tive cluster analysis of large numbers of structurally complex objects. First, the
analyst takes a manageable subset of the objects and applies clustering to this
subset. In this process, the analyst experiments with the clustering parameters
for gaining meaningful results with respect to the analysis goals. Then, the ana-
lyst builds a classifier, which can be used for attaching new objects to the existing
clusters. The analyst may also modify the clusters for achieving better understand-
ability and/or conformance to the goals. The resulting classifier is then applied to
the whole dataset. Each object is either attached to one of the clusters or remains
unclassified, if it does not fit in any cluster. When necessary, the analyst may
repeat the procedure (take a subset—cluster—build a classifier—classify) to the
unclassified objects.
In order to attach new objects to previously discovered clusters, one or several
prototype objects, or prototypes, is (are) selected in each cluster such that the dis-
tance of any other cluster member to one of these objects is below a certain thresh-
old. The distance is measured by the same distance function as has been used for
the clustering. The prototypes of the clusters, the respective distance thresholds
(which may be prototype-specific), and the distance function together create a
classifier.
Attaching new objects to the so-defined clusters is done by comparing the
objects to the cluster prototypes, that is, finding the distances by means of the dis-
tance function. An object is attached to a cluster if its distance to one of the proto-
types is below the respective threshold. If an object is close to prototypes of two or
5.1 Characteristics 157

more clusters, the closest prototype is chosen. If an object is not sufficiently close
to any of the prototypes, it remains unclassified.
The whole procedure can be formalized as the following Algorithm 5.5.

The computational time required for the classification (step 5) depends linearly
on the number of objects in D: each object is compared with a constant number
of cluster prototypes (unlike clustering, where each object needs to be compared
with all others). Hence, the algorithm is quite scalable with respect to the database
size. Although step 5 may take minutes or even hours for a very big dataset, it does
not require the involvement of a human analyst. It is supposed that the analyst has
obtained meaningful, goal-oriented clusters by running the clustering method with
different settings at step 2 and interactively refining the outcomes at step 4. If this
is the case, the results of the following cluster-based classification will also be
meaningful and conform to the goals of the analysis.
Algorithm 5.5 starts with a selection of a subset of the original dataset. The
subset should have a manageable size and at the same time be representative of
the dataset as a whole. An ideal sampling strategy must preserve the actual distri-
bution of the objects in the original dataset. Uniform sampling from the database
is a reasonable strategy when a density-based clustering algorithm is used: dense
regions in the original dataset remain (relatively) dense in the sample and hence
can be discovered by the algorithm. In a case when a dense region becomes too
158 5  Visual Analytics Focusing on Movers

sparse in the sample, there is still a possibility of detecting it in the successive


iteration of the process. Specifics of the data and/or goals of the analysis may jus-
tify a specific way of selecting the subset. For example, for the Milan trajectories,
it is reasonable to select a subset of trajectories from one working day since high
similarity of distributions in different working days can be expected.
Selection of prototypes from density-based clusters is a non-trivial problem. In
a density-based cluster, each object is close to a user-chosen minimum number of
other objects (neighbours), which is a parameter of the algorithm. However, two
arbitrary cluster members may be quite distant from each other; therefore, a clus-
ter may have rather high internal variation. Multiple prototypes need to be taken
from different parts of such a cluster in order to represent the cluster adequately.
To find appropriate prototypes in a density-based cluster, we divide it into
“round” subclusters. A round (sub)cluster is a set of objects S = {o1, o2,…, ok} for
which there is a special object o′ and distance ε such that d(oi, o′) < ε, 1 ≤ i ≤ k,
and for any other object o ∉  S, d(o, o′)  ≥  ε. The object o′ (real or theoretical)
is called the centre of the (sub)cluster S. The maximum distance among d(oi, o′),
1  ≤ i ≤ k, is called the radius of the (sub)cluster S. In a case when the objects
are points and the distance function d is Euclidean distance, the notions of centre,
radius, and round cluster can be understood literally. In a case of structurally com-
plex objects, such as trajectories, and arbitrary d (e.g. the “route similarity” func-
tion), these notions need to be understood metaphorically.
For complex objects and distance functions, finding the true centre of a round
(sub)cluster is a complex problem, not only computationally but also conceptually.
However, for the purposes of building a classifier, the true centres are not really
needed. They can be substituted by medoids. A medoid is a member of a subclus-
ter having the smallest mean distance to all other members. Medoids may be used
as cluster prototypes.
Formally, the problem of selecting cluster prototypes may be stated as follows:
given a cluster C, a distance function d, and a maximum distance threshold εmax,
divide C into subclusters {S1, S2, …, Sn} where for ∀Si ∃mi  ∈  Si (medoid) and
∃εi ≤ εmax such that for ∀o ∈ Si, d(o, mi) < εi. For solving the problem, we sug-
gest the algorithm described below (Algorithm 5.6). At each stage, the status of
the algorithm is represented by a list L where each entry consists of a subcluster
and its corresponding medoid <Si, mi>.
5.1 Characteristics 159

At the end, the list L represents a partitioning of the cluster C into round sub-
clusters. The medoids of the subclusters become the prototypes of the original
clusters. The maximum distance from a medoid to the members of its subcluster
is taken as the distance threshold for this prototype. Although the computational
complexity of Algorithm 5.6 is O(n2), where n is the number of objects in cluster
C, this is not critical due to the relatively small sizes of density-based clusters that
can usually be discovered in a not so big subset D′ of the database D. Besides, the
distances between the objects, which are needed for Algorithm 5.6, are computed
at the stage of density-based clustering (step 2 in Algorithm 5.5) and can be later
reused, which substantially reduces the computation time.
Although the output of Algorithm 5.6 could be immediately used as a classi-
fier, there are at least two reasons why it should be visually inspected and, pos-
sibly, modified by the analyst. First, density-based clusters may have high internal
variation and therefore may be difficult to understand. The analyst may wish to
refine them by dividing into parts with smaller internal variation and/or by remov-
ing some of the members. Second, the analyst may wish to tune the selection of
cluster prototypes and distance thresholds to his/her understanding of the distinc-
tive properties of the clusters.
Furthermore, the analyst needs to make sure that the classifier will correctly
assign new objects to the defined clusters. This can be tested by applying the clas-
sifier to D′. Since the assignment of objects to clusters is done in different ways
in the classification and in the density-based clustering, the outcomes of the clas-
sification may differ from the original clusters. Some of the original members of
a cluster may not be there any more (such objects will be called false negatives),
and/or some new objects may be put in the cluster (such objects will be called
false positives). This discrepancy is not necessarily disadvantageous. It may hap-
pen that a false negative is too dissimilar to the other objects in the cluster and
should not be there, and it may also happen that a false positive is sufficiently
similar to the core objects of the cluster and should be there. Hence, each case
of divergence between the two assignments of the objects to the clusters needs
to be inspected by the analyst. If the analyst is not satisfied with the new assign-
ment, he/she should be able to refine the part of the classifier responsible for the
misclassification.
To enable the revision and refinement of the classifier, the following editing
operations can be used:
1. Exclude one or several subclusters from a cluster and perform one of the fol-
lowing actions:
(a) make a new cluster as a union of these subclusters;
160 5  Visual Analytics Focusing on Movers

(b) turn each subcluster into a new cluster;


(c) discard the subclusters, that is, treat their members as not belonging to
any cluster.
2. Divide a subcluster into two or more smaller subclusters.
3. Merge two or more subclusters into a single larger subcluster.
4. “Dissolve” one or more subclusters, that is, distribute their members among the
remaining subclusters.
5. Change the distance threshold of a selected prototype.
The operations 2, 3, and 4 involve automatic re-computing of the medoids of
the subclusters. For dividing a subcluster into smaller subclusters (operation 2),
we use the k-medoids method, which is modified so that the analyst can select the
initial seeds for the new subclusters.
After any operation, the analyst visually inspects the results and, possibly, runs
the test of the classifier on D′. If the results are not satisfactory, the analyst may
revert to the previous state.
Let us show by examples when it may be reasonable to edit a classifier and how
this can be done. We use the results of the previous density-based clustering with
the distance function “route similarity” (the 15 largest clusters out of the 60 clusters
obtained are shown in Fig. 5.6). Algorithm 5.6 is used to divide the density-based
clusters into “round” subclusters and select cluster prototypes. This gives us a draft
classifier. We use interactive visual techniques to review and edit the classifier. It
would be very tedious and time-consuming to review each of the 60 clusters; how-
ever, not all clusters necessarily require the analyst’s attention. The analyst mainly
needs to look at clusters having many prototypes. A large number of prototypes
signify high internal variation within the cluster. Possibly, such a cluster should be
refined by dividing it into two or more clusters that have lower internal variation
and/or by removing members that differ considerably from the rest.
Thus, cluster 1 shown in Fig. 5.14a has been automatically divided into 17
round subclusters, that is, 17 prototypes are needed to represent this cluster in
the classifier. The prototypes are coloured in red in Fig. 5.14b. The large num-
ber of the prototypes is caused by a high internal variation in the routes in the
cluster: it includes trajectories going from east to west and trajectories that are in
the last one-third of the route turn to the north-west. It is reasonable to separate
these two groups of trajectories by dividing cluster 1 into two clusters. Hence, we
interactively select the subclusters consisting of the trajectories going to the west
(Fig. 5.14c) and move them from cluster 1 to a new cluster (cluster 61). Then, we
look at the trajectories that remain in cluster 1 (Fig. 5.14d) and notice that one
trajectory deviates from the north-western direction. It has been selected as a pro-
totype, but the respective subcluster consists of only this trajectory (Fig. 5.14e).
We decide to clean cluster 1 by removing this trajectory, that is, we discard the
selected subcluster consisting of a singular trajectory. The final content of cluster 1
is shown in Fig. 5.14f. It is represented by 7 prototypes.
Another example of cluster refinement is presented in Fig. 5.15. In this case, we
take cluster 3, which has been initially represented by 19 prototypes (a). We first
5.1 Characteristics 161

Fig. 5.14  Interactive editing of a classifier for one cluster (cluster 1). a The original cluster. b
The trajectories selected as the prototypes are coloured in red. c The trajectories going to the
west are selected for moving to a new cluster. d Only the trajectories that turn to the north-
west remain in the current cluster. e A subcluster consisting of a single trajectory is selected for
removing from the cluster. f The cluster and its prototypes after the editing

Fig. 5.15  In the process of classifier editing, cluster 3 (a) is being refined by dividing into three
clusters (b, c, d)

select seven subclusters with shorter trajectories turning to the east of the motor-
way (b) and move them to a new cluster 62. Then, we select one subcluster with
shorter trajectories turning to the west of the motorway, towards the city centre,
and move them to a new cluster 63. The remaining cluster 3 (d) consists of trajec-
tories going farther to the south and is represented by 11 prototypes.
After running the test of the classifier, it is also necessary to look at the clusters
for which there were classification errors, that is, false positives and/or false nega-
tives. As we explained, such errors are not necessarily disadvantageous; still, there
162 5  Visual Analytics Focusing on Movers

may be cases requiring improvement in the classifier. For example, Fig. 5.16 shows
a cluster for which the classifier test found one false negative and one false positive;
they are highlighted in black in the images a and b, respectively. Initially, the clus-
ter has been represented by two prototypes. To make the classifier better recognize
the cluster members, we first merge the two subclusters into one and then divide
this one subcluster into three subclusters. For these three subclusters, we interac-
tively select the original seeds. One of the seeds is the trajectory that has not been
recognized by the classifier. In this way, we ensure that this trajectory will be rec-
ognized by the refined classifier. The result of the refinement is shown in Fig. 5.16c.
After this selective inspection and editing of the classifier, we apply it to the
whole dataset, which consists of 153,292 trajectories. The process of the portion-
wise loading of the data in RAM, classification, and storing of the results in the
database takes about 21 min; however, this does not require our involvement. We
look at the final results when they are ready. A total of 9,646 trajectories have
been classified as belonging to one of our 63 clusters (60 original + 3 new), and
143,646 trajectories have been labelled as “noise”. Figure 5.17 presents graphical
summaries of 15 largest clusters resulting from the classification. Note that both
parts of the original cluster 1 are among the most frequent routes: after the classifi-
cation, cluster 1 includes 636 trajectories (the fourth image in the upper row), and
cluster 61 includes 536 trajectories (the first image in the second row). Figure 5.18
shows the classification results for the original cluster 3, which was divided into
clusters 3, 62, and 63 (Fig. 5.15).
Selected clusters of trajectories can be loaded from the database to RAM for
a more detailed analysis. For example, we have earlier analysed the movement
dynamics in the trajectories of cluster 16 from the one-day data (Sect. 5.1.2.4,
Figs. 5.12 and 5.13). Now, we load cluster 16 resulting from the classification; the
summary image is the third in the upper row in Fig. 5.17. The cluster consists of
638 trajectories distributed over the whole week. We apply to it clustering with
the distance function “route similarity + dynamics” and parameters D = 800 m,
T = 60 s, and N = 5 neighbours (we have increased N from 3 to 5 in comparison
with the previous application since the cluster size has increased, and hence, a core
object of a cluster may have more neighbours). The clustering results are similar to
what we had for the cluster with the one-day trajectories: there is one dense cluster
consisting of 409 trajectories with nearly uniform movement dynamics, while the

Fig. 5.16  Classifier testing has revealed a false negative (a) and a false positive (b) in one of the
clusters. The classifier of the cluster has been refined by increasing the number of prototypes (c)
5.1 Characteristics 163

Fig. 5.17  Fifteen largest trajectory clusters discovered by applying the classifier to the whole
database consisting of 153,292 trajectories

Fig. 5.18  Classification results for the three clusters resulting from the refinement of cluster 3

remaining 229 trajectories with various deviations from the uniform movement are
labelled as “noise”. In Fig. 5.19, the cluster members are coloured in red and the
“noise” is grey. The scatterplot shows the distribution of the cluster members and
“noise” over the day and the respective durations of the trajectories. The overall
pattern is similar to what is seen in Fig. 5.13 except that there are no temporal
gaps in the distribution of the red dots. The frequency histogram on the right of
164 5  Visual Analytics Focusing on Movers

Fig. 5.19  Further clustering of a selected cluster (cluster 16) using the distance function “route
similarity + dynamics”

Fig. 5.19 shows the distribution of the trajectories over the days of the week. The
red bar segments represent the cluster members, and the dark grey segments repre-
sent the “noise”. We see that there were fewer trajectories and much lower propor-
tions of the “noise” on Saturday and Sunday (days 6 and 7) than on the working
days. Among the working days, Thursday (day 4) has especially high proportion
of non-uniform movements: there were 78 trajectories with non-uniform move-
ment and only 37 trajectories with uniform movement.
Returning back to the whole set of trajectories, the cluster analysis is contin-
ued by loading a subset of the unclassified trajectories (“noise”) to RAM, applying
clustering to it, building a new classifier, and applying the classifier to the whole
set of unclassified trajectories. Our experience shows that with each new iteration
step, the number and/or the sizes of discovered clusters substantially decrease in
comparison with the previous step. For example, Fig. 5.20 shows the 15 largest
clusters resulting from the “route similarity” clustering of the trajectories from
Monday that have not been classified in the first iteration step. The number of dis-
covered clusters is 60, as in the first step, but the largest cluster consists of only
98 trajectories, while in the first step, the maximal cluster size was 179 and the
sizes of six clusters exceeded 100. All clusters in the second step contain in total
1,041 trajectories out of 11,873 (less than 9 %), while in the first step, the clusters
contained 1,744 trajectories out of 8,206 (more than 21 %). After four or five steps
of the procedure, only very small clusters can be discovered. Unfortunately, there
is no formal criterion for ending the procedure but only an empirical criterion: the
procedure is ended when the next iteration step does not generate new interesting
clusters. What is “interesting” depends on the application and analysis goals.
Not only do the number and sizes of discovered clusters decrease with each
iteration step of the procedure, but the analyst’s effort needed for editing of the
classifier also decreases. The editing effort is high for big clusters with high
internal variation, but when a cluster is originally coherent, little or no editing is
needed. Our experience shows that large and incoherent clusters mostly appear
in the first iteration step. In the following steps, the discovered clusters tend to
become smaller due to the decreasing density of the data. The internal variation
5.1 Characteristics 165

Fig. 5.20  Fifteen largest “route similarity” clusters discovered in a subset of unclassified trajec-


tories from Monday

in the clusters is also small. The number of discovered clusters also decreases.
Hence, the editing effort significantly lessens with each step. Thus, in our experi-
ments, we usually spent 30–45 min reviewing and editing the first classifier and
only 5–10 min for the following classifiers (mainly for reviewing; almost no edit-
ing was required).
In principle, the suggested approach for extending clustering to very large data-
sets is generic, that is, applicable not only to trajectories but also to other types of
complex objects requiring special distance functions for assessing their dissimi-
larity. To make the approach work for a particular type of objects, certain type-
specific components are necessary: (1) a database representation of the objects;
(2) a distance function; (3) a visual representation of the objects; (4) optionally, a
method for graphical summarization of clusters.

5.1.3 Visualization of Positional Attributes

There is no entirely satisfactory method for representing positional attributes of


trajectories on a map. While it is technically possible to encode attribute values by
visual properties of line segments, that is, by colouring or line thickness, this can
166 5  Visual Analytics Focusing on Movers

be effective only when there are no overlapping segments belonging to different


trajectories or to the same complex trajectory. In Sect. 4.1, we have said that posi-
tional attributes can be visualized in temporal displays, such as the time graph and
temporal bar chart. However, like a map, a time graph may suffer from occlusions
and display clutter since lines representing different trajectories often intersect or
overlap. In contrast, in a temporal bar chart, a stacking layout eliminates occlu-
sions and intersections (see Figs. 4.6 and 4.7). Each trajectory gets its individual
portion of the display height.
Temporal displays show the temporal variation in positional attributes sepa-
rately from the spatial context. Establishing a link to the spatial context is done
on the level of a single trajectory (Fig. 4.6), but even two trajectories are hard to
compare, for example, to see whether similar values of a positional attribute occur
in the same or close locations. Filtering of trajectory segments (Sect. 4.2.3, Fig.
4.13) allows the user to see the spatial distribution of selected attribute values, but
the user cannot see what values are nearby.
It appears logical to eliminate occlusions from spatial displays (maps) in the
same way as for temporal displays, that is, by using a stacking layout. In a tem-
poral bar chart, one dimension represents time and another dimension is used for
stacking visual elements (segmented bars) representing trajectories. For repre-
senting two-dimensional space, we need two display dimensions. We can add one
more display dimension and use it for stacking visual elements representing tra-
jectories. These may be segmented bands such that their shapes and positions with
respect to the two spatial dimensions of the display correspond to the spatial prop-
erties of the trajectories. As in a temporal bar chart, each trajectory receives its
individual portion of the third display dimension. The bands representing different
trajectories are stacked one upon another; hence, the bands of different trajecto-
ries do not overlap. This approach, however, is effective not for arbitrary trajecto-
ries but only for groups of trajectories having similar shapes and spatial positions.
Such groups can be selected by means of spatial filtering (Sect. 4.2.1, Fig. 4.10) or
through clustering by route similarity (Sect. 5.1.2.2). A stack of bands representing
spatially similar trajectories looks like a wall; therefore, we call this display the
“trajectory wall”. An example is shown in Fig. 5.21.
In this example, we use the 118 trajectories of cluster 16 resulting from the
clustering of the Milan trajectories from Wednesday by route similarity (Sect.
5.1.2.2, Fig. 5.6). We have previously used this cluster to demonstrate the explora-
tion of the movement dynamics by means of the distance function “route similar-
ity + dynamics”. We have found that the cluster contains trajectories with uniform
and non-uniform movements. The trajectory wall display offers us another way
to look at the movement dynamics. The display is oriented so that the north-west
is on the left and the south-east on the right. The bands are ordered from bottom
to top according to the start times of the respective trajectories. The colouring of
the band segments encodes the values of the positional attribute “speed”. High
speed values are represented by shades of green and low values by orange and
red; yellow corresponds to the speeds between 15 and 30 km/h. The bands where
all or almost all segments have the same colour (green) represent trajectories with
5.1 Characteristics 167

Fig. 5.21  A trajectory wall represents one route similarity cluster of the Milan trajectories from
Wednesday. The trajectories are represented by segmented bands. The colouring of the segments
encodes the values of the attribute “speed”. The bands are stacked one on top of another

uniform high-speed movement. The intrusions of yellow, orange, and red colours
indicate that the movement slowed down.
We have not only an elementary view of the speeds in each individual trajec-
tory but also an overall view of the distribution of the speeds over the space and
across the multiple trajectories. We see the places where many cars slowed down.
Low speeds mostly occur in neighbouring trajectories in the stack. Since the tra-
jectories are ordered according to their start times, the vertical dimension of the
display partly conveys the temporal component of the data. Hence, closeness of
segments representing low speed values in the display may mean that these values
are clustered in both space and time. The large spot of reduced speeds in the lower
left part of the display may signify a prolonged traffic jam in the morning, which
would correspond to our earlier observation that there was a long time interval in
the morning when the durations of the trajectories greatly increased and there were
no trajectories with uniform movement (Fig. 5.13). The temporal positions of the
low speed values can be determined by utilizing the dynamic link between the tra-
jectory wall and the temporal bar chart: when the mouse cursor points on a tra-
jectory segment in the wall, the corresponding segment is marked in the temporal
bar chart and its time reference is shown in a special area of the display. Detailed
information about the currently pointed trajectory and segment, including the tem-
poral references, can also be seen in an optional information display overlaid on
the trajectory wall (it is not shown in the figure, for preserving the readability).
This helps us to locate traffic jams in time. We find out that the congestion on the
north-west happened in the morning from about 5:40 till about 10:00. There was
also a smaller congestion on the south-east from about 6:20 till about 7:20 in the
168 5  Visual Analytics Focusing on Movers

morning. We also make a more general notice that in the morning, the speeds were
quite low on almost the whole length of the route. In the afternoon, a short period
of obstructed traffic occurred from about 15:30 till about 17:00. This is also con-
sistent with our previous observations (Fig. 5.13).
Pointing with the mouse gives us temporal information only on the elementary
level of a single trajectory segment. To enable a higher level of temporal analysis,
the display includes an element called the time lens, which is visible in the lower
right corner of Fig. 5.21. The time lens shows temporally aggregated information
for an interactively defined spatial query area (a circle of a chosen radius around
the mouse cursor position, located in the lower left corner of Fig. 5.21). The inte-
rior of the time lens shows the relative spatial positions of the trajectory points
within the selected area. The points are represented by dots coloured according
to the attribute values. The ring of the time lens represents one of the temporal
cycles: 4 quarters of a year, 12 months of a year, 7 days of a week, or 24 hours
of a day. In our example, the latter cycle is chosen. The ring is divided into bins
corresponding to the units of the chosen cycle (hours in our case). The fill levels
of the time bins visualize temporally aggregated information about the trajectories
that intersect with the query area. The possible aggregates are the count of the tra-
jectories, the total time spent in the query area (i.e. the sum of the times from all
trajectories), and the average time (i.e. the total time divided by the count of the
trajectories). In our example, the time lens shows the average times. The division
of the bin contents into coloured segments shows the proportions of attribute val-
ues from different value intervals within the aggregates.
With the mouse, we have selected a query area in the place on the north-west
where there are many trajectory segments with low speed values. The time lens
shows us that such values mostly occurred in the hours from 5 till 9 in the morn-
ing (i.e. the intervals from 5 to 6 o’clock till 9 to 10 o’clock). Furthermore, we
see great differences in the average times spent in the query area in different time
intervals of the day. In the morning hours from 5 to 7, it took from 16 to 25 min on
average for a car to move through the query area. In the next 2 hours, the average
times decreased to 9 and 8 min, while during the rest of the day, the average times
were 2–3 min. Hence, during the traffic congestion, the car drivers lost on average
from 6 to 23 min of their time in comparison with normal driving.
We can apply the trajectory wall tool also to a larger set of similar trajectories.
Thus, Fig. 5.22 shows the 638 trajectories from the whole week that have been
assigned to cluster 16 by the classifier created in Sect. 5.1.2.5. As previously, the
colour coding represents the values of the positional attribute “speed” using the
same colour scheme and the same value intervals. We can clearly see the yellow–
orange–red spots signifying traffic congestions. We can use different modes of the
time lens to explore the distribution of the low speed values over the days of the
week and times of the day. In Fig. 5.22, the mouse position approximately cor-
responds to the position in Fig. 5.21, that is, to the Wednesday morning and the
north-western part of the route. The radius of the spatial query is approximately
the same as in Fig. 5.21 for Wednesday, but the temporal extent of the query spans
over the whole week. To the right of the wall, the time lens shows the distribution
5.1 Characteristics 169

Fig. 5.22  A trajectory wall display shows a cluster of trajectories in Milan from the whole week
(cluster 16 in Fig. 5.17)

of the trajectory counts over the days of the week. We see that the largest number
of trajectories (112) crossed the query area on Thursday and there was also the
largest proportion of low speed values. On the right of Fig. 5.22, there are images
of the time lens in other modes. On the top right, the time lens shows the aver-
age duration of traversing the query area by the days of the week. We see that
the longest average duration was on Thursday (11 min) and the second longest on
Wednesday (7 min). This aggregation was done over the entire days. On the bot-
tom right, the time lens shows the average time of driving through the query area
by hours of the day, irrespective of the days of the week. The most problematic
hours are 6 (14 min), 7 (10 min), 8 (8 min), and 5 (7 min). The durations decrease
after hour 9 (i.e. after the interval 9–10 o’clock) but then increase in hours 15–17,
but to much lower values than in the morning.
The time lens is limited to showing only one temporal cycle. It alone does not
support the exploration of the value distribution with respect to two cycles, such as
daily and weekly. However, this can be done with the help of interactive temporal
filtering (Sect. 4.2.1). We can select trajectories from each day of the week and see
the distribution of the trajectories, speed values, and movement durations over the
hours of the day. In this way, we find out, in particular, that the longest average
delays occurred on Wednesday, but the longest overall duration of the traffic con-
gestions was on Thursday. Thus, in the north-west, the movement was slow from
hour 5 till hour 17 with only a short break in hour 14.
The trajectory wall tool is described in more detail by Tominski et al. (2012).
170 5  Visual Analytics Focusing on Movers

5.1.4 Analysis of Multiple Positional Attributes

For a more sophisticated analysis of movement behaviour, it may be necessary to


take into account multiple positional attributes. Unfortunately, direct visualization
of values of multiple attributes in the spatial or temporal context is hardly possible.
The way to deal with multiple attributes is to cluster the trajectory points or seg-
ments according to values of these attributes. The results of the clustering can then
be represented in a temporal bar chart, trajectory wall, and/or space–time cube,
to support the exploration of the spatio-temporal distribution of the clusters. The
clusters themselves can be interpreted with the help of multi-attribute displays,
such as the parallel coordinate plot (PCP) and scatterplot matrix.
To provide an example of analysing multiple positional attributes by means of
clustering, we shall use the group walk data described in Sect. 2.10.5. Our goal is
to analyse the movement behaviour of the group. We want to characterize the move-
ment of the group in terms of compactness (i.e. whether the group members keep
close to each other), directionality (i.e. whether the people go in a particular direction
or wander), and stops/moves. Using a tool for computing derived thematic attributes
from position records (Sect. 3.4), we produce the following positional attributes:
• Speed (km/h);
• Length of the bounding rectangle diagonal in a 30-s time window;
• Sinuosity in a 30-s time window;
• Distance to nearest neighbour (m);
• Count of neighbours within 5 m radius and 10-s time window.
The first two attributes are meant to separate stops from moves. The sinuosity
can be an indicator of the directionality of the movement: high sinuosity means
wandering rather than going in a particular direction. The last two attributes char-
acterize the movement in terms of compactness.
Based on the computed attributes, we perform clustering of the segments of the
trajectories. It is not essential what clustering algorithm is used; however, parti-
tion-based or hierarchical clustering is preferred over density-based clustering
since we want to divide the trajectories into parts corresponding to different types
of behaviour rather than to separate dense clusters from noise. In this example, we
use k-means clustering. The results of the clustering are immediately projected on
a PCP, where the lines representing the attribute value combinations for the trajec-
tory segments are coloured according to the cluster membership. This allows us to
interpret the clusters in terms of the values of the attributes. We run the clustering
tool with different values of the parameter k (number of clusters) and try to inter-
pret the clusters by looking at the PCP. In this way, we find that the best results in
terms of clarity and interpretability are obtained with k = 6. In Fig. 5.23, the clus-
ters are presented in two screenshots of the PCP display, for better legibility. The
screenshot on the left represents clusters 1, 2, and 3; the screenshot on the right
shows the remaining three clusters. The assignment of the colours to the clusters is
visible above each screenshot.
5.1 Characteristics 171

Fig. 5.23  A parallel coordinate plot shows the results of clustering of trajectory segments
according to values of multiple positional attributes. The colours of the lines represent cluster
membership. Two screenshots of the display show two subsets of the clusters obtained

Clusters 1 (red) and 2 (yellow), evidently, correspond to stops: they are charac-
terized by low speeds and small lengths of the bounding rectangle diagonal. The
sinuosity varies from low to very high values. The latter mean that the people made
small movements in different directions, probably, for looking around. Clusters 1
and 2 differ in the counts of neighbours: red corresponds to stopping in compact
groups, when the number of neighbours is high, and yellow corresponds to looser
arrangements with fewer neighbours. Cluster 3 (green) is characterized mostly by
medium values of the speeds, bounding rectangle diagonals, and neighbour counts.
It can be interpreted as slow walking in a subgroup of 4–7 people. Clusters 4 (cyan),
5 (blue), and 6 (magenta) correspond to faster walking. Cluster 5 is the fastest. It
has low to medium counts of neighbours; evidently, fast movement is easier in small
groups. Clusters 4 and 6 differ in the neighbour counts: cluster 4 corresponds to
movement in isolation or in a small group and cluster 6 to movement in a big group.
With these interpretations in mind, we analyse the spatial, temporal, and spatio-
temporal distributions of the different types of movement behaviours using various
visual displays. A temporal bar chart (Fig. 5.24a) allows us to see how the differ-
ent types of movement behaviour represented by the clusters are distributed over
time and the group members. We see, in particular, that there were several time
intervals of different lengths when all or many group members stopped (clusters
1 and 2—red and yellow). During the last and longest joint stop, which occurred
around 17:40, the people first kept close together (cluster 1) and then dispersed
(cluster 2), possibly, for observing the environment. Some people, whose trajec-
tory bars are at the bottom of the display, preferred not to keep very close to the
bulk of the group (clusters 4 and 2—cyan and yellow—clearly prevail), and the
others tended to move in relatively large compact groups, especially in the mid-
dle of the group trip. It is interesting that in a number of trajectories, fast and slow
movements (clusters 5 and 3—blue and green) alternate.
172 5  Visual Analytics Focusing on Movers

Fig. 5.24  The clustering results are represented on a temporal bar chart a and on a map b all
clusters, c clusters 1–3, d clusters 4–6)

In a map (Fig. 5.24b), we can see the spatial distribution of the different types
of behaviour. As in Fig. 5.23, we have made two different screenshots for clus-
ters 1–3 (Fig. 5.24c) and 4–6 (Fig. 5.24d). We see that the segments belonging
to clusters 1–2 are short and occur in several places along the route, which are,
evidently, the places where the group stopped. The space–time cube in Fig. 5.25a
confirms that red and yellow colours (clusters 1 and 2) signify stops: the segments
are aligned vertically, that is, the spatial positions did not significantly change dur-
ing some time. In the trajectory wall (Fig. 5.25b), we see where and how many
people walked in tight or loose subgroups.
Hence, clustering of trajectory segments according to multiple positional attrib-
utes allowed us to identify several types of movement behaviour of people moving
in a group and relate these types to space and time. It is also possible to investigate
the movement behaviour of each individual.

5.2 Relations

Relations among movers and between movers and other elements of the spatio-
temporal context (Sect. 2.7) are an essential aspect of the individual, collective,
and general movement behaviours of the objects. It is typically assumed that
relations of movers to the context (including other movers) are reflected in their
5.2 Relations 173

Fig. 5.25  The clusters are represented in a space–time cube (a) and on a trajectory wall (b)

trajectories, so that analysing relations of the trajectories to other trajectories


and to context data can reconstruct the relations of the movers, which cannot be
directly observed. In this section, we focus on relations of movers’ trajectories as
proxies of relations of the movers. Particularly interesting are interactions, that
is, such relations between two or more spatial objects when the objects are suf-
ficiently close in space, so that they can (potentially) exert mutual or reciprocal
action or influence upon one another. It may be also interesting to learn that mov-
ers of a given type or from a given group never approach certain spatial objects.
However, this kind of relation can be regarded as an absence of interactions.
Therefore, analysis of relations of movers to other spatial objects typically focuses
on detecting and analysing their interactions. In this section, we consider detection
and analysis methods oriented to different types of interactions.

5.2.1 Encounters Between Moving Objects

The term “encounter” refers to a situation when two or more moving objects come
close to one another in space and time. In other words, this is an occurrence of the
spatial relation “close” between movers. Laube et al. (2005) introduce encounters
as one of the types of relative motion patterns. Encounters can further be classified
into specific subtypes based on attributes of the moving objects, such as direction
and speed. Several types of encounters are considered by Orellana et al. (2009)
and Orellana and Renso (2010).
The notion of what can be classified as an encounter is not absolute. Rather,
it is highly dependent on the domain, the data, and the specific kind of relative
movement of two objects that a domain expert may look for. For example, danger-
ous situations in which ships in a harbour get too close to one another are defined
in terms of different spatial and temporal distances than meetings of people in a
mall. Therefore, a one-size-fits-all encounter analysis cannot be hard-coded in
174 5  Visual Analytics Focusing on Movers

an algorithmic computation. Domain experts need a highly interactive and visual


environment, in which they can choose suitable parameters for encounter detec-
tion, such as the windows for the spatial (ΔS) and temporal distance (ΔT) within
which an encounter occurs. In the process of the analysis, there may be a need to
refine the originally chosen parameter values and/or to modify them for testing the
parameter sensitivity of the results.
To meet these requirements, an encounter detection algorithm needs to scale up
with the number of moving objects, location points, and the temporal and spatial
resolution of the data. The algorithm must not only cope with the growing size of
movement data, but must also be fast enough to enable the interactive analysis.
We have designed an algorithm that rearranges the data in a special structure, such
that only one sweep over the data is needed to build this data structure and simul-
taneously detect all the encounters. This algorithm is followed by a post-process-
ing step, in which the list of the detected encounters can be filtered according to
various types of user-specified features, such as the directions and velocity of the
encountering objects. The method is embedded in the visual analytics infrastruc-
ture, where the results are graphically displayed, and the user can interactively
change the parameters for obtaining the desired insights of the data.
The input of our method is movement data in the form of position records.
Each record consists of a unique identifier of the moving object, coordinates of the
location, and a timestamp. The position records need to be arranged in a chrono-
logical order irrespective of their grouping into trajectories. In other words, the
algorithm works on a single chronological sequence composed of position records
of different objects.
The encounter detector receives the data and the user-chosen values for the
spatial and temporal windows ΔS and ΔT that define an encounter. Its task is
to find pairs of time-referenced positions of different objects such that the spa-
tial and temporal distances between them are below ΔS and ΔT, respectively. A
brute force encounter detector would check all possible pairs of position records.
However, such a solution is not scalable and therefore not applicable to large
datasets. Instead, our proposed method rearranges the data in a set of sorted lists
(SSL), such that only a minimal number of positions need to be compared.
Clearly, comparing only the locations of objects with time difference shorter
than ΔT is sufficient when looking for encounters. To limit the search to only
such objects, we arrange the data into an SSL. Since the dataset is chronologically
ordered, our algorithm performs only one sweep over it for arranging the records
into an SSL and simultaneously finding the encounters. We define an integer num-
ber N such that any duration shorter than the ratio dt = ΔT/N is regarded as neg-
ligible. Here, dt is the desired temporal resolution of the encounter search. For
example, if ΔT is one minute, choosing N  = 60 gives the temporal resolution of
one second. Each sorted list in the SSL corresponds to one time interval with dura-
tion dt (e.g. to one second). The lists are arranged chronologically, that is, the first
list corresponds to the interval [0, dt), the second list to [dt, 2dt), and so on. Hence,
any N sequential lists correspond to one time interval of the length ΔT. Therefore,
when looking for encounters between a particular position record p and other
5.2 Relations 175

Fig. 5.26  The designated
data structure is based on the
sliding window principle.
Each window of the length
ΔT is made up from a set of
sorted lists (SSL)

position records, it is sufficient to check 2N lists. More specifically, if the record


p belongs to the list Li, it needs to be compared with the records of the lists from
Li–N to Li+N−1. Furthermore, if the position records are processed in the chrono-
logical order, each record p fitting in the list Li needs to be compared only with the
other records of the list Li and the records of the preceding N lists Li−N, Li−N+1,
Li−N+2, …, Li−1. When it comes to the following records fitting in the lists Li, Li+1,
Li+2,…, Li+N, they will be compared, among others, with the record p.
In general, the SSL may be composed not only of chronologically ordered posi-
tion records but of any type of ordered structures that contain points with coordi-
nates, for instance, KD trees or range trees (Samet 2006). Furthermore, the method
can be applied to objects in any space: two- or three-dimensional, geographical
or non-geographical, network, etc., provided that the specifics of the space are
appropriately reflected in the distance function determining the spatial distances
between the objects.
The parameters of the SSL are schematically represented in Fig. 5.26.
One problem with this approach is that some encounter points could be missed
due to the discrete character of the data, where the position records exist not for
any time moment but only for a sample of time moments when the measurements
have been taken. For example, an encounter of two objects would be missed if
there is no position record of at least one of the objects that is close enough to the
encounter point (according to the thresholds ΔS and ΔT). In order to avoid such
cases, we perform an interpolation of the data when the sampling is too sparse.
We regard the position records of a trajectory as too sparse when the temporal dis-
tance between two sequential positions is greater than ΔS, or when the temporal
distance between them is greater than ΔT. As explained below, during the process
of adding new positions to the SSL, the algorithm checks the sampling rate and
performs interpolation when necessary.
We refer to an encounter between two positions of the same object as a self-
encounter. An absence of self-encounters is a signal that the interpolation needs
to be done. The algorithm performs a linear interpolation between the new posi-
tion and the last position of the same object. It adds a new position exactly in the
middle between the two positions in both time and space. If there are still no self-
encounters between the new position and the existing ones, then a new position is
176 5  Visual Analytics Focusing on Movers

added between every two sequential positions. The algorithm keeps adding posi-
tions in this fashion until self-encounters occur between every two sequential posi-
tions. In summary, self-encounters are used to determine whether interpolation
needs to be performed, but they are not counted as encounters between objects and
are not included in the result list.
The encounter detection algorithm as a whole can be described as follows.
5.2 Relations 177

The idea behind the algorithm is that if the time of the next position record is
t, it is compared only to positions with the times not earlier than t − ΔT. Under
this condition, it is not necessary to compute the temporal distances between the
position records, but it is sufficient to compute the spatial distances. However, in
the case of interpolation, the chronological order of the processing of the position
records is disrupted since the times ti′ and ti″ of the new position records pi′ and
pi″ created in the step 2.9.2 may be earlier than the times of the records that have
been already processed. Therefore, in addition to checking the records with the
times ti′ − ΔT, it is also necessary to check the previously processed records with
the times ti′ + ΔT, and the same for ti″. This is reflected in step 2.5.
After finding all encounters, we can execute a post-processing step with the
purpose of classifying the encounters and/or filtering them. For example, the user
may be interested only in encounters of objects with certain velocities and relative
movement directions. The user can set three parameters: the velocity, the angle
between the movement vectors, and the duration of the encounter sequence. The
user can set a range of possible values for each parameter individually. For exam-
ple, all the encounter types in Fig. 5.27 can be identified using these three param-
eters. A parallel encounter has long duration and an angle of approximately 0°.
A cross encounter has an angle between 0° and 180° and short duration. A head-
on encounter has an angle of approximately 180°. A parking encounter, that is, an

Fig. 5.27  Examples for higher-level encounter patterns emerging from single encounters, show-
ing trajectories as discrete positions and their direction of move. Encounters are represented by
connecting lines between positions of movement. a Parallel encounter. b Cross encounter. c Head
front encounter. d Parking encounter
178 5  Visual Analytics Focusing on Movers

encounter occurring in the course of parking or anchoring, has long duration and
low speed. Obviously, other parameters can be added when necessary.
We shall demonstrate the work of the encounter detection method using the ship
trajectory dataset from the North Sea (Sect. 2.10.3). First, we define the spatial and
temporal thresholds for encounters. Literature suggests that the spatial distance
between vessels should not go below 0.2 nautical miles (approximately 370 m). Since
we conduct interpolation between position records if the distance between them
exceeds the spatial threshold (by expecting self-encounters), the temporal threshold
could be set to a fairly low value (1 min). The sampling rate of the existing position
records is about 10 min on the average. As explained earlier, the interpolation allows
us not to lose any of the encounters even when a small temporal threshold is used.
The results of the first experiment are shown in Fig. 5.28. Among the more than
6,000 ships, we have found about 1,500 elementary encounters, that is, pairs of
positions. The encounters are represented on the map by short lines connecting the
positions; on a large-scale map, the lines appear as dots. The main purpose of this
investigation was to identify the spatial distribution of the encounters. From the
results, it is evident that encounters mainly occur close to the harbour areas where
a natural spatial bottleneck is formed as the entry to the harbour terminals. It is
also interesting to observe that many of the encounters appear in the areas of route
intersections or junctions. In addition, we observe several interesting encounter
patterns that occur in the open sea and deserve further analysis.
A further, more detailed analysis was conducted on a subset of the trajectories
within a small area. The spatial and temporal thresholds were kept from the previ-
ous run. The results are shown in Fig. 5.29. As a part of the post-processing step,
we have grouped the elementary encounter events into composite spatial events
(Sect. 2.5) by joining consecutive elementary encounters of the same objects. In
this way, we have reconstructed about 12 composite encounter events from more
than 400 elementary encounter events. The composite encounter events have been
classified into four different types based on the angles between the movement vectors:
head-on, parallel, passing, and following.
Different types of encounters have different visual appearances on a map. A
head-on encounter, which means two moving objects passing by each other in
close distance, is identified as a single occurrence of an elementary encounter
and appears on a map as a single line connecting positions from two trajectories.
Parallel movement of two or more objects typically appears as a sequence of par-
allel lines along the trajectories of the objects involved. However, sometimes there
may be only one such line. In such a case, parallel movement may be confused
with a head-on encounter, if the relative movement directions of the objects are
not taken into account. An interesting encounter type was found at two occasions,
indicating that one object is moving significantly faster than the other and pass-
ing it. This is a subclass of parallel encounters, but differs in appearance, since
the encounter lines are not parallel anymore; rather, they are like a fan spreading
out in one direction. There was also one encounter pattern in the form of a long
sequence of consecutive encounter lines indicating a following pattern. In this par-
ticular case, one ship was following the other one in close distance over 5:40 h.
5.2 Relations 179

Fig. 5.28  Encounters found in ship trajectories using 0.2 nautical miles spatial and 1 min tem-
poral distance thresholds. The spatial distribution indicates that most encounters occur close to
harbour areas and at intersections of the sea traffic lanes

Algorithms for extraction of other types of spatio-temporal relations and inter-


actions between moving objects can be found in data mining literature, for exam-
ple Laube et al. (2004, 2005) and Gudmundsson et al. (2007). Overviews of the
existing methods are given by Laube (2009), Gudmundsson et al. (2012), and
Parent et al. (2013).
180 5  Visual Analytics Focusing on Movers

Fig. 5.29  Encounters found in ship trajectory data and classified into higher-level patterns:
head-on, parallel, passing, and following patterns

5.2.2 Relations in a Group of Movers

In Sect. 5.1.4, we provided an example of analysing the movement behaviours of


objects moving jointly as a group. We characterized the behaviours in terms of
movement characteristics, such as speed and sinuosity, and also attributes reflect-
ing some relations between the movers, such as distance to nearest neighbour and
number of neighbours within a given distance. In analysis of group movement, not
5.2 Relations 181

Fig. 5.30  The map (left) and space–time cube (right) show the central trajectory of a group of
movers (red) computed from the individual trajectories of the group members (light blue)

only relations of each mover to the nearest neighbours are of interest but also rela-
tions to the whole group. An analyst may wish to find answers to the following
questions, which are specific to group movement:
• What trajectory can represent the movement of the whole group?
• How compact or dispersed is the group in space and how does this change over
space and time?
• How coherent are the movements of the group members? Are there individu-
als deviating from the course of the group? When and where did the deviations
occur and how far did the individuals deviate?
• Which individuals move in the forefront of the group? Which individuals tend
to keep in the rear? Are the relative spatial positions of the movers stable over
time?
For answering any of these questions, it is necessary to know the position and
course of the group core or centre in each time unit. Then, the compactness/disper-
sion can be characterized in terms of distances from the group centre, the coher-
ence/deviations in terms of angles between the courses of the individual group
members and the course of the group, and the arrangement can be determined
from the relative positions of the members along the group movement vector.
Hence, the first thing to do is to construct a trajectory that will represent the move-
ment of the group centre/core, that is, its position and course in each time unit.
Figure  5.30 shows the trajectories of the members of a group walk (the data-
set was introduced in Sect. 2.10.5 and used for illustrations in Sect. 5.1.4) in light
blue. The central trajectory of the group, which has been constructed according to
182 5  Visual Analytics Focusing on Movers

the following Algorithm 5.8, is shown in red. Simultaneously with the construc-
tion of the central trajectory, the algorithm computes thematic attributes character-
izing the group movement: distances of group members to the group centre and to
the group movement vectors, deviations of the movement directions, etc.
5.2 Relations 183
184 5  Visual Analytics Focusing on Movers

After constructing the central trajectory of the group and computing the posi-
tional attributes characterizing the movement of the group, we can analyse these
attributes using interactive visual displays. Both the attributes attached to the
positions of the central trajectory and those attached to the positions of the group
members’ trajectories need to be analysed. Thus, Figs. 5.31 and 5.32 represent dif-
ferent positional attributes of the central trajectory by line width. This visualiza-
tion technique is applied on a map and in a space–time cube. Figure 5.31 shows
the variation in the third quartile of the distances of the group members to the cen-
tre along the route. We can see where the people walked compactly (this is where
the line is thin) and where the group was more dispersed (the line is thick). In
a similar way, Fig. 5.32 shows the variation in the third quartile of the direction
deviations from the overall group movement vector. The thick line segments cor-
respond to high deviations.
It can be observed that the locations of many of the thick segments coincide
with the locations of the movement behaviour types represented by clusters 1 and
2 in Sect. 5.1.4. These behaviours are characterized by low speed and high sinuos-
ity (which is consistent with the high direction deviations from the group move-
ment vector); we have interpreted them as stops. The small inset on the top left
of Fig. 5.32 shows the positions of the stops identified by setting a segment fil-
ter (Sect. 4.2.3) for the original trajectories according to the length of the bound-
ing rectangle diagonal within 30 s: only the segments with values below 5 m are
shown by the glyphs in blue, which are drawn with 25 % opacity. By comparing
the two maps in Fig. 5.32, we notice that the segments with high direction devia-
tions occur not only at the positions of the stops but also on significant turns of
the route. Furthermore, we notice that the movements of the group members were
more coherent (i.e. the deviations were lower) in the first half of the trip than in the
second half (the group moved clockwise starting from the place in the eastern part
of the territory where the direction deviations are the highest).
We can also investigate the relations of the individual movers to the movement
of the whole group by visualizing the positional attributes that have been attached
to the individuals’ original trajectories. The most convenient tools for this are the
temporal bar chart and trajectory wall, since the trajectories can be shown with-
out overlap. In Fig. 5.33, we have visualized the computed positional attribute
“distance from the group centre along the group movement vector”, which shows
which of the group members walked in the forefront and who in the rear and how
much ahead or behind the group centre a group member was in different times
and different parts of the route. In both displays, shades of red are used for posi-
tive relative distances corresponding to positions in front of the group centre and
5.2 Relations 185

Fig. 5.31  A map (left) and a space–time cube (right) show the third quartiles of the distances of
the group members to the group centre along the route

Fig. 5.32  A map (left) and a space–time cube (right) show the third quartile of the deviations of
the movement directions of the group members from the group movement vector along the route.
An inset on the top left shows the positions of the stops (in blue)

shades of blue for negative relative distances, when the movers were behind the
group centre. The light yellow colour represents positions close to the centre. The
values of the attribute are defined not for all position records. The value is unde-
fined when the group movement vector is undefined (during a stop) or when the
movement direction of an individual deviates from the movement direction of the
group by more than the chosen threshold (we use 60°).
186 5  Visual Analytics Focusing on Movers

Fig. 5.33  The values of the computed positional attribute “distance from the group centre along
the group movement vector” in the individual trajectories are visualized on a trajectory wall (top)
and a temporal bar chart (bottom)

We see that there were people who always or almost always tended to be in
front of the others or at least in the centre. Some other people tended more to the
rear positions. The widest range of the relative distances is in the middle of the
trip, around the position of the yellow vertical cursor on the temporal bar chart.
At these times, both the largest positive and the largest negative values occurred,
which means that the group stretched over quite a long distance. In the trajectory
wall display, we see that this happened in the eastern part of the route (the wall is
turned so that the north is on the right). The dynamic link between the displays
allows us to relate the times in the temporal bar chart to the spatial positions in
the trajectory wall. In Fig. 5.33, the mouse cursor points on a trajectory bar in the
temporal bar chart. The bar is marked by a white frame; the corresponding band
in the trajectory wall is marked with a black frame. The position in the wall cor-
responding to the mouse position in the bar chart is marked by a black dot, from
which a black projection line to the horizontal plane is drawn. Hence, we can see
where this mover (STEFAN) and two others were far behind the group centre (the
exact value of the relative distance is shown on the top of the temporal bar chart).
5.2 Relations 187

By moving the mouse up and down, we can see where the other group members
were at the same time.
Another possible view of the relative positions of individuals in a group can
be obtained by means of transforming the geographical space to an abstract two-
dimensional “group space”, where group centre is taken as the origin of the coor-
dinate system, that is, as the point (0, 0), the group movement vector is taken as
the Y-axis, and the X-axis is perpendicular to the Y-axis. The positional attributes
“signed distance to group movement vector” and “distance to group centre along
the group movement vector” (computed in the steps 4.9 and 4.10 of Algorithm
5.8) define the X- and Y-positions of the trajectory points in the group space. The
space transformation to “group space” has been earlier discussed in Sect. 3.3.
To summarize the positions of each individual in this new space, a regular grid
(e.g. with square cells) is constructed. For each cell, the positions of each individual
fitting within the cell are counted. To facilitate comparisons between the individuals,
the absolute counts can be transformed to relative (percentages) with respect to the
total number of trajectory points of the respective individuals. The values associated
with the grid cells can be visualized on maps representing the group space. A suit-
able technique is small multiple maps where each small map shows the distribution
of the positions of each individual. The display may also include one map showing
the distribution of the positions of all group members taken together.
An example is shown in Fig. 5.34. The small multiple maps show the distribu-
tions of the relative positions of the 12 group members in the group space; the
last map shows the distribution for the whole group. The colour darkness in each
cell is proportional to the count of the positions in this cell as a percentage of the
total count of the positions of the respective individual (in the last map, the whole
group). The white colour corresponds to zero values, and the grey colour is used
for cells containing no positions of any group member. Each small map contains
the X- and Y-axes showing the position of the group centre and the direction of the
group movement vector (to the north). Individual differences between the group
members can be easily seen. Thus, BAVOSL and MJK tended to the front or cen-
tral positions in the group, BIRGIT and STEFAN tended to the central or rear
positions, and GENNADY was mostly on the right side of the group. The posi-
tions of some individuals are more dispersed than the positions of the others.
We have previously observed on other displays that the relative positions of the
group members changed over time. During the trip, there were two common stops,
which are manifested as grey gaps in the temporal bar chart in Figs. 5.33. We
divide the time span of the data into three intervals: before the first stop, between
the first and the second stop, and after the second stop. Then, we summarize the
relative positions of the group members in the group space in each time interval.
Figure  5.35 shows the small multiple maps of the position distributions for each
of the three intervals. The lower left image shows the maps for the whole time
span (the same as in Fig. 5.34). We can observe that in the second interval (upper
right), the group members were more spatially dispersed than in the first inter-
val (upper left). BIRGIT, NICO, and STEFAN moved from the centre to the rear,
MJK from the centre to the front, and IRMA moved from the rear to the front. In
188 5  Visual Analytics Focusing on Movers

Fig. 5.34  Small multiple maps show the distribution of the positions of each group member
in the group space. The Y-axis corresponds to the group movement vector. The crossing of the
X-axis and Y-axis corresponds to the position of the group centre. The positions of the individu-
als are summarized by cells of a regular grid. The colour darkness in each cell is proportional
to the count of the positions in this cell as a percentage of the total count of the positions of the
respective individual (in the last map, the whole group is shown)

the third interval, the spatial dispersion decreased. MJK moved farther to the front,
NICO moved from the rear to the front, STEFAN from the rear to the centre, while
DAVID moved to the rear.
These examples demonstrate that the transformation of the positions of the
group members from the geographical space to the abstract group space and the
visualization of the position distribution of each individual in the group space sup-
port the investigation of the individual behaviours of the group members over time
and comparisons between the individuals.
We can also analyse the ordering of the group members along the group move-
ment vector. The temporal bar chart in Fig. 5.36 visualizes the computed posi-
tional attribute “relative position along the movement vector” with values ranging
from 1 to 12 (where 12 is the number of the group members). The value 1 means
5.2 Relations 189

Fig. 5.35  Comparison between the positions of the group members in the group space in three
different time intervals and for the whole time. The lower right image is the same as in Fig. 5.34

Fig. 5.36  A temporal bar chart visualizes the positional attribute “relative position along the
movement vector”. The shades of green correspond to the first three positions and the shades of
purple to the last three positions
190 5  Visual Analytics Focusing on Movers

that the person was in front of all others, and the value 12 means that the per-
son was the last. The shades of green in the display correspond to the first three
positions (the darkest green represents the first position), and the shades of purple
represent the last three positions (the darkest purple means the very last position).
White corresponds to the six positions in the middle. We see that there was no
permanent leader or leading clique in the group during the whole trip. The three
bars at the top of the display reveal three group members who most frequently had
the leading positions. The first two (DAVID and BAVOSL) were leading during
about the first two-thirds of the trip time, and the third one (MJK) joined them in
the middle of the trip and then was leading in the second half of the trip. The fifth
trajectory bar from the top, labelled JUKKA, shows that the corresponding person
was the leader or one of the leaders in the first half of the trip and then shifted to
the centre and sometimes to the end. The leading position of JUKKA was, evi-
dently, taken by MJK. At the bottom, there are three bars of the persons who were
very frequently at the end of the group.
By computing various statistics of the positional attributes, we can numerically
characterize the tendency of the individual group members to be in the forefront or
in the rear of the group. Some of the computed statistics are shown in a table lens
display in Fig. 5.37. The rows are ordered according to the values of the aggregate
attribute “median of relative positions along the group movement vector”. We see
that the group members with the identifiers DAVID, BAVOSL, and MJK had the
lowest median and mean values of the relative position and also the lowest modes
of the order. MJK had the leading (first) position in the group in 28.61 % of all
position records of this person; the next frequent leaders are DAVID (20.89 %),
followed by JUKKA (19.35 %).

Fig. 5.37  A table lens display shows the statistics of the distances and relative positions of the
group members along the group movement vector
5.2 Relations 191

Hence, the construction of the central trajectory of a group enables the compu-
tation of a number of positional attributes expressing spatial and spatio-temporal
relations among the group members: distances from the group centre and move-
ment vector, spatial arrangement along and across the route line, and spatio-tem-
poral order. Earlier we have mentioned positional attributes expressing relations of
spatio-temporal neighbourhood (distance to the nearest neighbour and number of
neighbours within a spatio-temporal window; see Sect. 5.1.4), which are also very
relevant to studying the behaviours of movers in a group. Using a tool for filter-
ing of trajectory segments (Sect. 4.2.3), we can investigate two or more positional
attributes expressing movers’ relations in combination.
Thus, after we have identified the group leaders, an interesting question is
whether the leader usually moves in the group forefront alone or together with
someone else. We compute the positional attribute “count of neighbours within
2 m and 5 s” and create two instances of the temporal bar chart display, one show-
ing the relative positions along the group movement vector and the other showing
the neighbour counts. Then, we use the interactive tools of the first bar chart for
setting a trajectory segment filter so that only the trajectory segments correspond-
ing to the first position in the group are visible (Fig. 5.38 top). The filter affects the
other bar chart (Fig. 5.38 bottom) so that we see only the neighbour counts for the
leaders. The prevalence of bar segments coloured in yellow and red means that the
leaders usually had at least one neighbour (represented by yellow) and often two
or more (represented by shades of red). The cases of walking without neighbours
(blue) are quite rare. More precisely, 12.5 % of all points in the dataset correspond
to having the leading position in the group and only 2.3 % of all points to being
the leader without neighbours (the statistics concerning the points satisfying the
filter conditions are shown below the bar charts; to obtain the second figure, we
have additionally set a segment filter in the second bar chart to select only the val-
ues below 1, i.e., equal to zero). Hence, we can conclude that being in front of the
group correlates with having company.
When we set the segment filter in the first bar chart so as to see only the last
three positions (Fig. 5.39 top) and look at the second chart (Fig. 5.39 bottom), we
observe the opposite relationship: being at the end of the group correlates with
being alone. This is manifested by the prevalence of dark blue bar segments in the
second bar chart. More precisely, 13.8 % of all points correspond to the last three
positions in the group and 11 % of all points to being in one of the last three posi-
tions and having no neighbours.
Since the cases when a group leader walked alone are so rare, we are curious
where they happened. We create a combined segment filter by setting two seg-
ment filters in the two bar charts. In the first one, we select the value 1 (i.e. first
position in the group) and in the second one, the value 0 (i.e. no near neighbours).
The points satisfying the combined filter are shown on a map in Fig. 5.40 (left); an
enlarged map fragment is shown on the right. The points are represented by glyphs
in blue, showing the spatial positions of the points and the movement directions
before and after them. We see that the cases when a group leader had no near neigh-
bours happened mostly in the northern part of the territory, that is, in the second
192 5  Visual Analytics Focusing on Movers

Fig. 5.38  Two temporal bar charts and a tool for trajectory segment filtering are used to inves-
tigate the neighbourhood relations of the group leaders. The upper chart shows the relative posi-
tions along the group movement vector. The values above 1 are filtered out, that is, only the
segments when a person was in front of the group are shown. The lower chart shows the counts
of neighbours within 2 m and 5 s. The segment filter set in the upper chart affects also the lower
chart, that is, only the values corresponding to the leaders are visible. We see that the leader usu-
ally had one or several neighbours, as signified by the yellow and red colours

half of the walk. A closer look reveals that the lone leaders were often aside of the
central trajectory of the group (red line). We check this observation by computing
the statistics of the values of the positional attribute “distance to the group move-
ment vector” (it is computed in step 4.9 of the Algorithm 5.8) for two combined fil-
ters “leader and alone” and “leader and not alone” as well as for all trajectory points
without filtering. As can be seen from the table in Fig. 5.41, the median distances to
the group movement vector are higher when people are the lone leaders than when
they are the leaders with company and also higher than the general median dis-
tances (with only one exception).
This section shows that many kinds of relations between movers can be
expressed through positional attributes, which are computed based on distances in
space and time. A similar approach to analysing group movement is applied by
von Landesberger et al. (2013), who also compute a number of attributes charac-
terizing the position and movement of each group member in relation to the group:
distances to the group mid-point and boundary, differences of individual speed and
5.2 Relations 193

Fig. 5.39  The upper chart shows the relative positions along the group movement vector. Only
the segments with the values 10 or more are selected by a segment filter, that is, when a per-
son was on one of the last three positions in the group. The filtering affects also the lower chart
showing the counts of neighbours within 2 m and 5 s: only the values corresponding to the three
last positions in the group are visible. We see that the people at the rear of the group often walked
alone, as signified by the prevalence of the blue colour

Fig. 5.40  A two-attribute segment filter has selected the trajectory points corresponding to the
leading positions in the group with the absence of near neighbours. On the right, a fragment of
the map shown on the left is enlarged
194 5  Visual Analytics Focusing on Movers

Fig. 5.41  The table lens display shows the statistics of the relative positions of the group mem-
bers, of the cases when they were the group leaders without and with near neighbours, and of the
distances to the group movement vector

direction to the speed and direction of the group, and whether an object is in the
group core or is an outlier. For the latter characteristic, von Landesberger et al.
use the definition of outlierness given by Wilkinson et al. (2005). The computed
attributes are then visualized on time graphs, which can be used for attribute-based
filtering. The trajectory segments satisfying the filter are shown on a map display.
To compare movements in two or more groups, the lines on the time graphs can be
coloured according to group membership.

5.2.3 Relations of Movers to the Environment

Besides relations among movers, such as encounters and arrangement in a group,


relations of movers to other elements of the spatio-temporal environment (context)
are of interest. A few approaches can be found in the literature. Bouvier and Oates
(2008) suggest an original interaction technique called “staining”, which may be
used for exploring emerging relationships between moving objects and elements
of the spatial context. The technique is used with an animated map showing object
movements. The user can mark some context item such as area or object on the
map by painting (staining) it in a particular colour. Several context items may be
stained with same or different colours. As moving objects move through a stain,
they also become stained, that is, painted in the colour of the stain. This allows
the user to observe easily which objects encountered the marked item, when it
happened, and how these objects behaved after that. Furthermore, it is possible to
run the animation backwards in order to see how these objects moved before the
encounter.
Crnovrsanin et al. (2009) compute spatial distances of multiple moving objects
to a selected item of spatial context, such as place (point or area) of interest, static
5.2 Relations 195

object, or moving object, and visualize the resulting dynamic attributes of the
moving objects on a time graph. Patterns formed by the lines on the graph not only
show the movements in relation to the selected context item and allow the user
to observe common behaviours and detect outliers but also indicate various emer-
gent relationships (referred to as “movement patterns”) among the moving objects:
spatial concentration (congestion), convergence, divergence, meeting, coincidence,
concurrence, etc. Interactive tools allow the user to select the objects participat-
ing in these patterns and observe their traces on a map. Two or more time graphs
enable comparison between movements from different places.
Our approach to detection of various relations between movers and the spatio-
temporal context is also based on computing spatial and/or temporal distances
between the positions of the movers and elements of the temporal context. We
shall demonstrate the approach using the example dataset with tracks of 72 roe
deer and 3 lynxes, which was introduced in Sect. 2.10.8. We shall focus on rela-
tions of the roe deer to locations in space, interactions between the roe deer and
the lynxes, and the impact of certain spatial events on the movement behaviours of
the roe deer. We shall again apply computation of positional attributes. However,
in this example, the attributes are derived by combining data from two datasets,
that is, from movement data and context data (see Sect. 2.9). The movement data
in this case are the trajectories of the roe deer. We shall use three different types of
context data: static spatial objects, trajectories of another kind of movers (lynxes),
and spatial events.
An overview map of the trajectories of the animals (Fig. 5.42) tells us about
their relations to locations in the geographical space. First, we observe that the roe
deer (their trajectories are in violet) seem to have preferred locations, that is, they
stay within quite small areas. We also observe that the roe deer sometimes appear
in open spaces (i.e. not covered by forest); these areas are represented on the map
by orange-filled polygons. The lynxes move over wide areas. Their appearances in
the open spaces, if any, are not detectable by merely visual inspection.
To analyse how often and when the roe deer appear in the open spaces, we
compute a positional attribute “distance to the nearest open space”. The attribute is
derived using two datasets: the trajectories of the roe deer and the dataset with the
boundaries of the open spaces, which are static spatial objects. The computed dis-
tances range from −1 to 15.65 km; the value −1 means that the position is inside
an area. By means of the segment filter, we extract the events of the distance to the
nearest open space being below 0 m, that is, the occurrences of the spatial relation
“inside” between the roe deer and the open area. 10.5 % of all trajectory points
satisfy the filter. From these points, 9,329 spatial events are constructed.
The extracted events are represented on the map in Fig. 5.43 by pink circles
drawn with 3 % opacity. A two-dimensional histogram in the centre shows the
temporal distribution of the events by the months of a year (horizontal dimension)
and hours of a day (vertical dimension). It is evident that the events of the roe
deer appearing in open spaces are more frequent in the spring, summer, and early
autumn months than in the winter months. Evidently, the roe deer come more fre-
quently to the open areas for grazing when grass is available. The frequencies are
196 5  Visual Analytics Focusing on Movers

Fig. 5.42  A fragment of an overview map with the trajectories of the roe deer (violet) and
lynxes (red), both shown with 20 % opacity. The orange-filled areas represent open spaces

Fig. 5.43  Left The extracted events of the roe deer coming to open spaces are represented on
a map by pink circles drawn with 3 % opacity. Centre A two-dimensional histogram shows the
temporal distribution of the events over the months of a year (horizontal dimension) and hours of
a day (vertical dimension). Right A two-dimensional histogram shows the temporal distribution
of all position records in the dataset

higher in the night hours than in the day hours, which may mean that the roe deer
prefer to graze in the dark. There are three particular hours (0, 4, and 20) when
the frequencies are notably higher than in the neighbouring hours. This may be
5.2 Relations 197

an artefact resulting from more frequent position measurements at certain hours


of a day. To check this, we build another two-dimensional histogram showing the
temporal distribution of all position records in the dataset. We see that, gener-
ally, the positions were more frequently measured in hours 0, 4, 8, 12, 16, and
20. However, the frequencies of the open space events (Fig. 5.43 centre) do not
increase in the hours 8, 12, and 16, which means that, indeed, the roe deer prefer
to appear in open areas in dark hours. Interestingly, the frequencies of the position
measurements are notably higher in February (month 2) than in the other months.
There is a corresponding slight increase in frequencies of the open space events,
but it is quite low compared to the increase in the summer months.
To investigate the spatio-temporal distribution of the open space events, we
aggregate them by the areas representing the open spaces and positions in the
annual and diurnal temporal cycles. The diagrams on the map in Fig. 5.44 show
the distribution of the events over the areas and the hours of a day. For each hour, a
diagram contains a segment with the length proportional to the count of the events
in the respective place and hourly interval. The segments are arranged clockwise
starting from the northward direction (i.e. the position of 0 o’clock on a clock
face) and coloured in three alternating colours (dark blue, light blue, and cyan), to
make the positions easier distinguishable. We see that the upper parts of the dia-
grams, which correspond to evening and night hours, are generally bigger than the
lower parts corresponding to the day hours. We also see that some open areas are
visited more frequently than others.
Hence, by computing distances and filtering position records of trajectories
by the distances, it is possible to detect occurrences of certain spatial relations
of movers to static spatial objects or locations with particular properties (such as
open spaces), specifically being inside the boundary of an object and being within
a given distance from an object. These occurrences can be extracted and further
analysed as spatial events.

Fig. 5.44  The extracted events of appearance in an open area have been aggregated by the areas.
The diagrams show the event counts by the hours of a day
198 5  Visual Analytics Focusing on Movers

In a similar way, interactions between two types of movers can be detected and
analysed. In particular, animal ecologists are interested in interactions between
predator and prey animals, such as lynxes and roe deer in our case. The algorithm
for detecting encounters presented in Sect. 5.2.1 is not applicable here for two rea-
sons. First, the algorithm does not account for different types of movers. It uses
a single chronologically sorted list of position records from all trajectories disre-
garding the identities of the corresponding objects. Second, the algorithm applies
interpolation, which would not be valid for the animal data. These data belong to
the category of episodic movement data, as defined in Sect. 2.9.2, due to the large
time gaps between the known positions. This means that it is impossible to detect
all interactions that might have occurred between the roe deer and the lynxes in
reality. However, we can hope that using only the known positions, we can find
indications of at least some of these interactions.
For this purpose, we compute the spatial distances between the points from
the trajectories of the roe deer and the points from the trajectories of the lynxes.
The computation takes into account a temporal tolerance threshold, that is, the
maximum distance in time between points from two trajectories when it is mean-
ingful to compute the spatial distance. The temporal tolerance is necessary for
dealing with data where known positions of different movers refer to diverse time
moments and valid interpolation is impossible. We choose the temporal tolerance
of one hour, taking into account that the data are sparse in time.
After computing the distances, we create segment filters to select only the
points from the trajectories of each animal species where the distance to the near-
est position of an animal of the other species is below 500 m. We consider the
occurrences of such distances as indications of possible encounters between the
roe deer and lynxes. From the trajectory points satisfying the filter, we create spa-
tial events. As a result, we obtain 16 events of proximity to a lynx that occurred
in the trajectories of seven different roe deer and nine events of proximity to a roe
deer that occurred in the trajectory of only one lynx named Nora. Evidently, in
some of the cases, the lynx was close to two or more roe deer.
To find out how many roe deer were close to the lynx at the time of each event,
we use an event characterization tool, which derives new thematic attributes of
events based on points from movers’ trajectories located within a given spatial dis-
tance threshold from an event and fitting within a given time window, which is
specified relative to the event’s time (e.g. 2 h before the event, ±1 h around the
event, etc.). Characterization of events based on movement data (trajectories) is
an instance of joint analysis of movement data and context data. In our example,
the movement data are the roe deer’s trajectories and the context data are the spa-
tial events of lynx’s proximity to roe deer. Although these particular spatial events
have been earlier extracted from movement data, they are now independent spatio-
temporal objects belonging to the spatio-temporal context of the positions of the
roe deer and lynxes.
The event characterization tool takes into account all current data filters, includ-
ing filters of trajectory segments: only the points satisfying all filters are checked
for fitting in the time window and used in computing the new attributes. The tool
5.2 Relations 199

counts the number of trajectory points and the number of different trajectories
in the time window, creates a list with the identifiers of these trajectories, finds
the start time of the first point and the end time of the last point, and computes
the statistics of a selected positional attribute of the trajectory points: minimum,
maximum, median, quartiles, mean, and standard deviation. All this information
is attached to the events as values of new thematic attributes, which express and
characterize relations between events and trajectories.
We now apply the event characterization tool to the events of the lynx’s prox-
imity to roe deer and the roe deer’s trajectories. We specify a spatial distance
threshold of 500 m and a time window from 1 h before the event till 1 h after the
event. Some of the attributes derived by the tool are shown in the table lens display
in Fig. 5.45. These are, from left to right, the number of different roe deer (i.e. tra-
jectories) in the event neighbourhood, the number of trajectory points in the neigh-
bourhood, the start time of the first of these points, the end time of the last point,
the minimal and maximal spatial distances to a roe deer from the event location,
the identifiers of the trajectories of the roe deer occurring in the event neighbour-
hood, and the identifiers of those trajectories that end within 24 h after the event.
We see that in three out of nine events, the lynx was close to two roe deer. The
identifiers of the roe deer’s trajectories are listed in the second from right column
of the table. The rows where the counts of points exceed the counts of different
trajectories mean that there were two or more points in one trajectory within the
given spatial and temporal distances from the event.
It would be interesting to see whether any of the extracted events of proxim-
ity between roe deer and lynxes reflects a successful hunt of a lynx, that is, when
the lynx caught and killed a roe deer. Unfortunately, the low temporal resolution
of the data does not allow us to see the traces of a lynx pursuing a roe deer and
then catching and dragging it. An indirect indication of a successful hunt is when
an event of proximity to a lynx occurred at the end of a roe deer’s trajectory. The
end time of a trajectory is often also the end of the animal’s life (but not always: a
trajectory also ends when the battery of the tracking device gets exhausted or the
device itself is damaged). In our example, the extracted events are quite few and
could be examined in detail one by one. However, we shall demonstrate a more
general approach, which is applicable to a larger number of events.
We create a segment filter selecting the points of the roe deer’s trajectories such
that the temporal distance to the end times of the trajectories is less than 24 h. The

Fig. 5.45  Characteristics of the events of lynx’s proximity to roe deer have been computed from
the points of the roe deer’s trajectories being in less than 500 m and 1 h from the events
200 5  Visual Analytics Focusing on Movers

event characterization tool is applied to the points satisfying the filter; the spatial
distance threshold and the temporal window are the same as in the previous run.
The tool attaches the following information to each proximity event (as values of
new thematic attributes): the number of trajectories having points in their last 24-h
fitting within the given spatio-temporal window around the event, the number of
such points, the identifiers of these trajectories, the start time of the first point, the
end time of the last point, and the statistics of the temporal distances to the ends
of the trajectories. From the nine events, only one gets non-zero and non-empty
values of the new attributes. In Fig. 5.45, the table row describing this event is
highlighted, that is, underlined by a thick black line. From the derived attributes,
we learn that there is one trajectory with one point in 15 h before its end that fits
within the spatial threshold of 500 m and the temporal window ±1 h around the
event. The identifier of the trajectory is shown in the last column of the table. It is
the trajectory of the roe deer named Harald.
Now, we can examine this event in detail using visual and interactive tech-
niques. We use a combination of interactive filters to select the event of interest
(filter by direct selection), the trajectories of the animals involved in the event (fil-
ter of related object sets), and a time window around the time of the event (tem-
poral filter). In Fig. 5.46, the selected data are shown on a map (left) and in a
space–time cube (right). The red line represents the trajectory of Nora, the line
highlighted in black represents the trajectory of Harald, and the violet line rep-
resents the trajectory of Helene, another roe deer involved in the same proximity

Fig. 5.46  The map (left) and space–time cube (right) show evidence of a successful hunt of
Nora, whose trajectory is in red. The trajectory of the roe deer Harald who was killed by Nora is
highlighted in black. The trajectory of another roe deer, Helene, who was nearby, is in violet
5.2 Relations 201

event. Only the parts of the trajectories fitting within the selected time interval
[te  − 36 h, te  + 36 h], where te is the time of the event, are visible. The circles
mark the proximity points in the three trajectories. It is evident that Helene moved
after meeting Nora, whereas Harald stayed in the encounter location for a while,
and then, his trajectory ended. It can also be seen that the next recorded position of
Nora after the encounter event was close to the encounter place.
We were able to check the fate of Harald using a lookup table with general data
about the tracked animals, which was available to us. The table says that Harald
was killed, which means that remains of the animal and its collar with the tracking
device were found. Now, we can say with a high degree of certainty that Harald
was killed by Nora and also specify the place and approximate time of this event.
By moving the time window in the temporal filter, we investigate the movement
behaviour of Nora after the event. During the following four days, Nora moved
several times back and forth between the place of the event and another place
located about 2 km north. As we learned from a domain expert, this behaviour is
typical for lynxes. A lynx moves to its kill normally in the evening. After feeding,
the lynx leaves the kill to a secure place called the daytime resting area.
The lookup table about the tracked animals has several more records about roe
deer that were killed. However, we could not find events of proximity to any of the
three tracked lynxes at the ends of the trajectories of these roe deer. Hence, either
the roe deer were killed by other predators or the events are not detectable due to
the very low temporal resolution of the available data.
The animal ecologists who track the roe deer and lynxes would like to know
how encounters of predators, such as lynxes, influence the movement behaviour of
roe deer. One hypothesis is that roe deer will go to more open areas, where lynxes,
which tend to avoid open spaces, cannot attack them. The event characterization
tool helps us check this hypothesis. We use it to extract statistics of distances of
the roe deer to open areas before and after the encounter events. For this purpose,
we run the tool twice: first, with the time window 2 h before an event and, sec-
ond, with the time window 2 h after an event. The results are shown in a table in
Fig. 5.47. For three events out of nine, there are no statistics of the distances after
the events since there were no points in the relevant trajectories in the 2-h time
intervals after the events. For the remaining six events, we see that the distances to
the open areas decreased after the events. For five events out of these six, the mini-
mal distance to open areas after the events was −1, which means that the animals
were inside the open areas. Only for one event was the minimal distance also −1
before the event. We can say that these results support the hypothesis that roe deer
encountering a lynx tend to move to open spaces.
In this section, we have demonstrated how relations between movers and ele-
ments of the spatio-temporal context of the movement can be discovered and
investigated using the following techniques:
• derivation of positional attributes representing spatial and temporal distances of
trajectory positions to context elements: static spatial objects, other movers, and
(spatial) events;
202 5  Visual Analytics Focusing on Movers

Fig. 5.47  The table shows that roe deer after encountering a lynx tend to move to open spaces

• filtering of trajectory segments according to these distances to find occurrences


of spatio-temporal proximity of the movers to the elements of the context;
• extraction of spatio-temporal events of proximity of the movers to the elements
of the context;
• characterization of (spatial) events, in particular, earlier extracted proximity
events, in terms of trajectory positions in their spatio-temporal neighbourhood
and various positional attributes;
• interactive visualization allowing viewing, exploration, and interpretation of the
derived information.
These quite generic tools support detection and investigation of binary relations
that can be expressed in terms of spatial and temporal distances. For more com-
plex relations, special methods may be required. For example, Laube et al. (2004)
suggest methods for detecting four types of relations among movers (the authors
use the term “relative motion patterns”): flock, leadership, convergence, and
encounter. Laube et al. (2005) discover relations of synchronous movement and
“trend setting” (when movements of some mover are repeated by other movers
after a time lag). Interested readers should examine Laube’s overview of existing
typologies and formalizations of diverse movement patterns and methods for pat-
tern discovery in the fields of geographical information science, data mining and
knowledge discovery, and computational geometry (Laube 2009). Gudmundsson
et al. (2012) survey the state of the art in computational movement analysis,
including extraction of various movement patterns, that is, relations between mov-
ing objects.

5.3 Recap

Trajectories, on the one hand, characterize moving objects and, on the other hand,
can themselves be considered as spatio-temporal objects. Trajectories as spatio-
temporal objects have properties characterizing them as units: positions of the
5.3 Recap 203

whole trajectories in space and time, shapes, path lengths, etc. Trajectories as com-
plex spatio-temporal objects also have complex properties composed of the prop-
erties of their components, including the spatial positions in different time units
and the values of positional thematic attributes, such as speed, direction, and oth-
ers. Analysis of trajectories includes investigation of both the overall characteris-
tics and the internal characteristics, that is, variation in the positions and positional
attributes over space and time.
Whenever possible, data exploration and analysis should begin with getting an
overview of the data. Flow maps can provide a good spatial overview of multi-
ple trajectories. Flow maps are based on discrete spatial aggregation of movement
data using a finite set of places. To generate a flow map that adequately conveys
the geography and topology of the movement, the places need to be appropri-
ately defined. We create a set of suitable places by means of territory tessellation
according to the spatial distribution of characteristic points of trajectories. Our
method extracts characteristic points from trajectories, groups the points by spatial
proximity, finds the centres of the groups, uses them as generating points (seeds)
for Voronoi tessellation, and then uses the resulting Voronoi cells as places for spa-
tial or spatio-temporal aggregation of the trajectories. The abstraction level of the
overview map can be regulated through the parameters of the territory division
algorithm. The method for trajectory summarization can be applied to clusters of
similar trajectories, to provide an overview of all clusters and enable comparison
between different clusters. Multiple clusters usually cannot be represented in a sin-
gle map in a meaningful way because trajectories usually intersect and overlap in
space and so do the clusters. We represent clusters of trajectories on multiple small
maps, each showing a single cluster in a highly generalized manner.
Clustering, that is, discovery and interpretation of groups of objects hav-
ing similar properties and/or behaviours, is a generic technique used in explora-
tion and analysis of various kinds of data. Generic clustering algorithms typically
assume that the objects subject to clustering are represented by vectors (points)
in a multidimensional space of features, that is, attributes. The Euclidean or
Minkowski distance between two vectors is taken as the measure of the dissimi-
larity between the objects. Trajectories can be characterized by various thematic
attributes, which can be computationally derived from the sequences of position
records. These attributes can be used as features in clustering. However, clustering
according to thematic attributes of trajectories as units may give results that do not
correspond well enough to the intuitive notion of similar trajectories. The dissimi-
larity between trajectories may need to be assessed by specific algorithms; such
algorithms are called distance functions.
We also argue that partition-based clustering methods are not well suited for
trajectories. A set of trajectories may include many trajectories that are dissimi-
lar to all others. Partition-based methods, like k-means, put each object in some
cluster. Being limited in the number of clusters, the algorithm may put an object
together with other objects that are not very similar to it. Hence, the variability
among objects within a cluster may be high. Density-based clustering methods
label dissimilar objects as “noise” and do not include them in any cluster. They
204 5  Visual Analytics Focusing on Movers

can be implemented in such a way that the process of finding clusters is separated
from the process of assessing the dissimilarity between objects, which is done by
an external distance function tailored to the specifics of a given data type and to
the analysis goals.
Trajectories are complex spatio-temporal objects with heterogeneous proper-
ties. Creating a single distance function that would account for all properties is
very difficult and, moreover, not reasonable. On the one hand, not all properties
may be simultaneously relevant in practical analysis tasks. On the other hand,
clusters obtained by means of a universal function covering all properties would
be very difficult to interpret. A more reasonable approach is to use a set of rela-
tively simple and easily understandable distance functions dealing with different
properties of trajectories. The analysis can be done in a sequence of steps. In each
step, clustering with a single distance function is applied either to the whole set of
trajectories or to one or several clusters obtained in the preceding steps. Step by
step, the analyst progressively refines his/her understanding of the data. The whole
process is called progressive clustering. A good property of progressive clustering
is that a simple distance function with a clear principle of work is applied on each
step, which leads to easily interpretable outcomes. However, successive applica-
tion of several different functions enables sophisticated analyses through gradual
refinement of earlier obtained results.
For the density-based clustering with specific distance functions, trajectories
need to be loaded in RAM, which severely limits the size of the dataset that can
be analysed. An approach that overcomes this limitation combines clustering with
classification. First, the analyst takes a manageable subset of objects (trajectories)
from a database and performs cluster analysis. Then, the analyst builds a classifier,
which can be used for attaching new objects to the clusters that have been dis-
covered. The analyst may also modify the clusters to enhance understanding and/
or conformance to the goals. The produced classifier is applied to the whole data-
set. Each object is either attached to one of the clusters or remains unclassified,
if it does not fit in any cluster. When necessary, the analyst may repeat the proce-
dure (take a subset — cluster — build a classifier — classify) to the unclassified
objects. Our experiments show that the number of clusters that can be discovered
and their sizes substantially decrease with each iteration step, which is a conse-
quence of the decreasing density of the remaining data.
Many of the specific distance functions designed for trajectories account for
internal characteristics of trajectories, including positional attributes; however, this
is hidden from the user. A movement analyst may also need to explore positional
attributes of trajectories in an explicit way using visual displays. Representing
positional attributes on a map may be ineffective due to overlapping of trajecto-
ries in space. We eliminate occlusions by applying a stacking layout in a three-
dimensional view called the “trajectory wall”, where two dimensions represent the
underlying territory and the third dimension is used for creating a stack of seg-
mented bands representing trajectories. Values of one positional attribute can be
represented by colouring of the segments. This space-based stacking approach
is effective for groups of trajectories having similar shapes and spatial positions.
5.3 Recap 205

Such groups can be selected by means of spatial filtering or through clustering of


trajectories by route similarity.
For a more sophisticated analysis of movement behaviour, it may be necessary
to analyse combinations of multiple positional attributes. Direct visualization of
values of multiple attributes in the spatial or temporal context is hardly possible.
The way to deal with multiple attributes is clustering of trajectory points or seg-
ments according to values of these attributes. The results of the clustering can then
be represented in a temporal bar chart, trajectory wall, and/or space–time cube,
to support the exploration of the spatio-temporal distribution of the clusters. The
clusters themselves can be interpreted with the help of multi-attribute displays,
such as the PCP and scatterplot matrix.
Relations among movers and between movers and other elements of the spatio-
temporal context are an essential aspect of the individual and collective movement
behaviours of objects. It is typically assumed that relations of movers to the con-
text (including other movers) are reflected in their trajectories, so that analysing
relations of the trajectories to other trajectories and to context data can reconstruct
the relations of the movers. Particularly interesting are interactions, that is, such
relations between two or more spatial objects when the objects are sufficiently
close in space, so that they can (potentially) exert mutual or reciprocal action or
influence upon one another.
We presented an efficient and scalable algorithm for detecting encoun-
ters between moving objects, that is, situations when two movers come close to
one another in space and time. Detected encounters can be classified as parallel
encounters, cross encounters, head-on encounters, parking encounters, etc., based
on the movers’ velocities, the angle between the movement vectors, and the dura-
tion of the encounter.
In analysing collective synchronous movement of a group of objects, it is nec-
essary to know the position and course of the group core or centre in each time
unit. This allows assessing the compactness or dispersion of the group and the
coherence of the movement and the deviations from the common route, deter-
mining the spatial arrangement of the group members, and identifying the group
leaders. We have presented an algorithm for constructing the central trajectory of
the group and computing the positional attributes characterizing the movement of
the group and expressing spatial and spatio-temporal relations among the group
members: distances from the group centre and movement vector, spatial arrange-
ment along and across the route line, and spatio-temporal order. Some attributes
are attached to the positions of the central trajectory and others to the positions of
the trajectories of the group members. Both sets of attributes can then be analysed
using interactive visual displays.
Relations of movers to other elements of the spatio-temporal context can also
be analysed by computing positional attributes of movers’ trajectories. In this case,
the attributes are derived by combining movement data (trajectories) and context
data. Relations between movers and elements of the spatio-temporal context of
the movement can be discovered and investigated by deriving positional attrib-
utes representing spatial and temporal distances of trajectory positions to context
206 5  Visual Analytics Focusing on Movers

elements, including static spatial objects, other movers, and (spatial) events. The
derived attributes are explored by means of filtering of trajectory segments and
extraction of movement events. Relations between events and trajectories can be
analysed by extracting and summarizing values of positional attributes of the tra-
jectories in the spatio-temporal neighbourhood of the events. The impact of events
on movement is investigated by comparing the values of positional attributes in
time windows before and after the events. In the following chapter, we shall focus
in more detail on analysis of spatial events related to movement.

References

Andrienko, N., & Andrienko, G. (2011). Spatial generalization and aggregation of massive move-
ment data. IEEE Transactions on Visualization and Computer Graphics, 17(2), 205–219.
Andrienko, G., Andrienko, N., & Wrobel, S. (2007). Visual analytics tools for analysis of move-
ment data. ACM SIGKDD Explorations, 9(2), 38–46.
Andrienko, G., Andrienko, N., Rinzivillo, S., Nanni, M., Pedreschi, D., Giannotti, F. (2009).
Interactive visual clustering of large collections of trajectories. In Proceedings of the IEEE
Symposium on Visual Analytics Science and Technology (VAST 2009) (pp. 3–10). IEEE
Computer Society Press.
Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J. (1999). OPTICS: Ordering points to identify
the clustering structure. In Proceedings of the ACM SIGMOD International Conference on
Management of Data (SIGMOD’99) (pp. 49–60), Philadelphia, USA.
Berndt, D., Clifford, J. (1994). Using dynamic time warping to find patterns in time series. In
Proceedings of the Knowledge Discovery and Delivery Workshop (pp. 359–370).
Bouvier, D. J., Oates, B. (2008). Evacuation traces mini challenge award: Innovative trace visual-
ization staining for information discovery. In Proceedings of the IEEE Symposium on Visual
Analytics Science and Technology (VAST) 2008 (pp. 219–220). IEEE Computer Society
Press.
Crnovrsanin, T., Muelder, C., Correa, C., Ma, K.-L. (2009). Proximity-based visualization of
movement Trace data. In Proceedings of the IEEE Symposium on Visual Analytics Science
and Technology (VAST) 2009 (pp. 11–18). IEEE Computer Society Press.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering
clusters in large spatial databases with noise. In Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining (pp. 226–231), Portland, Oregon.
González, M. C., Hidalgo, C. A., & Barabási, A.-L. (2008). Understanding individual human
mobility patterns. Nature, 453, 779–782.
Gudmundsson, J., van Kreveld, M., & Speckmann, B. (2007). Efficient detection of patterns in
2D trajectories of moving points. Geoinformatica, 11(2), 195–215.
Gudmundsson, J., Laube, P., & Wolle, T. (2012). Computational movement analysis. In W.
Kresse & D. Danko (Eds.), Springer handbook of geographic information (pp. 725–741).
Berlin–Heidelberg: Springer.
Han, J., Lee, J.-G., & Kamber, M. (2009). An overview of clustering methods in geographic data
analysis. In H. Miller & J. Han (Eds.), Geographic data mining and knowledge discovery
(2nd ed., pp. 149–188). Boca Raton: CRC Press.
Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: An introduction to cluster analy-
sis. New Jersey: Wiley.
Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer.
Laube, P. (2009). Progress in movement pattern analysis. In H. Aghajan & B. Gottfried (Eds.),
Behaviour monitoring and interpretation—Ambient assisted living (pp. 43–71). Amsterdam:
IOS Press.
References 207

Laube, P., van Kreveld, M., & Imfeld, S. (2004). Finding REMO—detecting relative motion pat-
terns in geospatial lifelines. In P. F. Fisher (Ed.), Developments in Spatial Data Handling,
Proceedings of the 11th International Symposium on spatial data handling (pp. 201–214).
Heidelberg: Springer.
Laube, P., Imfeld, S., Weibel, R. (2005, July). Discovering relative motion patterns in groups of
moving point objects. International Journal of Geographical Information Science, 19(6),
639–668.
Nanni, M., & Pedreschi, D. (2006). Time-focused density-based clustering of trajectories of
moving objects. Journal of Intelligent Information Systems, 27(3), 267–289.
Ng, R., Han, J. (1994). Efficient and effective clustering methods of spatial data mining. In
Proceedings of the 20th International Conference on Very Large Data Bases. Santiago, Chile.
Okabe, A., Boots, B., Sugihara, K., & Chiu, S. N. (2000). Spatial tessellations—Concepts and
applications of Voronoi Diagrams (2nd ed.). New York: Wiley.
Orellana, D., Renso, C. (2010). Developing an interactions ontology for characterizing pedestrian
movement behaviour. In M. Wachowicz (Ed.), Movement-aware applications for sustainable
mobility: Technologies and approaches (pp. 62–86). Hershey, PA, USA: Information Science
Reference.
Orellana, D.,Wachowicz, M., Andrienko, N., Andrienko, G. (2009). Uncovering interaction pat-
terns in mobile outdoor gaming. In Proceedings of the International Conference on Advanced
Geographic Information Systems and Web Services GEOWS 2009, Feb 1–7 (pp. 177–182).
Cancun, Mexico. Piscataway, NJ: IEEE Computer Society.
Parent, C., Spaccapietra, S., Renso, C., Andrienko, G., Andrienko, N., Bogorny, V et al. (2013).
Semantic trajectories modelling and analysis. ACM Computing Surveys, 45(4).
Pelekis, N., Andrienko, G., Andrienko, N., Kopanakis, I., Marketos, G., & Theodoridis,
Y. (2012). Visually exploring movement data via similarity-based analysis. Journal of
Intelligent Information Systems, 38(2), 343–391.
Rinzivillo, S., Pedreschi, D., Nanni, M., Giannotti, F., Andrienko, N., & Andrienko, G.
(2008). Visually-driven analysis of movement data by progressive clustering. Information
Visualization, 7(3/4), 225–239.
Samet, H. (2006). Foundations of multidimensional and metric data structures. Elsevier,
Amsterdam.
Schreck, T., Bernard, J., von Landesberger, T., & Kohlhammer, J. (2009). Visual cluster analysis
of trajectory data with interactive Kohonen maps. Information Visualization, 8(1), 14–29.
Shneiderman, B. (1996). The eyes have it: A task by data type taxonomy for information visuali-
zations. In M. Burnett, W. Citrin (Ed.), Proceedings of the 1996 IEEE Symposium on Visual
Languages (pp. 336–343). Piscataway: IEEE Computer Society Press.
Tominski, C., Schumann, H., Andrienko, G., Andrienko, N. (2012). Stacking-based visualiza-
tion of trajectory attribute data: IEEE Transactions on Visualization and Computer Graphics
(Proceedings IEEE Information Visualization 2012), 18(12), 2565–2574.
Vlachos, M., Kollios, G., Gunopulos, D. (2002). Discovering similar multidimensional trajecto-
ries. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02)
(pp. 673–684). IEEE.
von Landesberger, T., Bremm, S., Schreck, T., Fellner, D. (2013). Feature-based automatic iden-
tification of interesting data segments in group movement data. Information Visualization.
doi:10.1177/1473871613487084.
Wilkinson, L., Anand, A., Grossman, R. (2005). Graph-theoretic scagnostics. In Proceedings of
IEEE Symposium on Information Visualization (pp. 157–164).
Chapter 6
Visual Analytics Focusing on Spatial Events

Movers
Trajectories

Locations
Movement data Local time series
Spatial events

Spatial event data Spatial time series


Times
Spatial distributions

Fig. 6.1  This chapter addresses analysis tasks focusing on characteristics of spatial events and
their relations to the context. Characteristics of spatial events are represented by movement data
in the form of spatial event data (cf. Fig. 3.13)

Abstract In this chapter, we present visualization and analysis methods that


can support movement analysis tasks focusing on spatial events (Fig. 6.1). The
appropriate form of movement data is spatial event data. From spatial event data
describing elementary events, composite spatial events can be generated, in partic-
ular, spatio-temporal clusters of spatial events. We present an approach to finding
spatio-temporal clusters in very large sets of spatial events that do not fit in RAM.
We suggest two methods for visualization of spatially co-located spatial events
that do not involve computational aggregation. Growth ring maps represent clus-
ters of events by placing pixels representing individual events in a radial layout
around cluster centres. The pixels can be coloured according to the absolute tem-
poral positions of the events or their relative positions within temporal cycles.
Flower diagrams represent clusters of events by compositions of circle sectors
radiating from a common centre. The angular position of a sector represents the
position of the respective event in a temporal cycle and the length (radius) the
event duration. Overlapping of several sectors shows event density.
Spatial events may have textual characteristics. Composite events may be
characterized by text aggregates, that is, by frequent words and combinations

G. Andrienko et al., Visual Analytics of Movement, 209


DOI: 10.1007/978-3-642-37583-5_6, © Springer-Verlag Berlin Heidelberg 2013
210 6  Visual Analytics Focusing on Spatial Events

occurring in the texts of the smaller events the composite events comprise. To
facilitate interactive exploration of text aggregates by means of spatial and tempo-
ral filtering, we represent each frequent word or combination by a text event hav-
ing the same spatial and temporal positions as the composite event.
We also discuss how spatial, temporal, and spatio-temporal relations among
spatial events and between spatial events and other objects (particularly, trajecto-
ries of movers) can be investigated using spatial, temporal, and spatio-temporal
displays, computation of spatial and temporal distances, and interactive filtering.

As defined in Sect. 2.4, spatial event data materialize the function E → S × T,
which maps objects (spatial events) to locations in space and intervals in time.
Spatial event data with thematic attributes are represented by the formula
E  →  S  ×  T  ×  A, where A stands for one or more attributes. Spatial events are
intrinsic in movement. As we argued in Chap. 2, movement can be viewed as a
composition of spatial events. Occurrences of various relations between movers
and elements of the spatio-temporal context are also spatial events. Furthermore,
the position records in trajectories (m, t, s, a) can also be viewed as defining spa-
tial events; in Sect. 2.5, and we called them movement events. Hence, data about
positions of moving objects can be analysed both as trajectories and as spatial
events. In particular, episodic movement data are often viewed and analysed as
spatial events rather than trajectories.
Not all spatial events defined by movement data may be of interest for an ana-
lyst. At the same time, not all events that are of interest may be originally present
in movement data. An analyst should be able to extract or, when necessary, derive
relevant events from movement data alone or from a combination of movement
data and context data. There are two classes of movement events that are extracted
or derived in different ways:

• Individual movement events occurring within a single trajectory. These events


may be defined in terms of values of movement attributes (stop event, low-speed
event, turn event, etc.) or as instances of relations of a mover to the context
(visit of a place, coming close to an object, passing between two objects, etc.).
An individual movement event consists of either one point or several consecu-
tive points of an individual’s trajectory.
• Collective movement events involving two or more movers. These events are
defined in terms of relations between the movers: encounter event (Sect. 5.2.1),
spatial concentration (cluster) event, parallel movement event, opposite move-
ment event, etc. A collective movement event includes at least one point from
the trajectory of each mover.

A generic method to extract individual movement events from trajectories, as


described in Sect. 3.5, is to apply queries that set constraints on values of positional
attributes representing characteristics of the movement (speed, direction, etc.) or rela-
tions of the movers to elements of the context (spatial and/or temporal proximity, rel-
ative movement direction, etc.). In other words, the points or segments of trajectories
Visual Analytics Focusing on Spatial Events 211

are filtered according to values of one or more positional attributes. Positional attrib-
utes representing relevant movement characteristics and/or relations can be previ-
ously derived from movement data or from a combination of movement data with
context data (Sect. 3.4). The previous chapter contains several examples of extracting
movement events using an interactive filtering tool introduced in Sect. 4.2.3.
Collective movement events may need to be extracted from movement data by
specific methods, such as encounter detection (Sect. 5.2.1). To discover spatial
concentrations of moving objects, one can apply clustering methods, as described
in the next section.
In Sect. 2.5, we said that spatial events may be elementary and composite,
where a composite event consists of two or more elementary events. A collec-
tive movement event is always a composite event. An individual movement event
is elementary when it consists of one trajectory point and composite otherwise.
A spatio-temporal cluster of spatial events is a composite spatial event. Spatio-
temporal clustering methods can be viewed as tools for extracting clustering
events from elementary spatial events. The clustering methods also support the
investigation of spatial, temporal, and other characteristics of spatial events, since
they can group events by closeness of their positions in space and time and, pos-
sibly, closeness of values of their thematic attributes.

6.1 Extraction of Composite Spatial Events by Clustering

Spatio-temporal clusters, or, in other words, dense concentrations of events in


space and time, can be discovered by means of density-based clustering. We have
already discussed density-based clustering in Sect. 5.1.2, where it was used for
finding groups of similar trajectories. The degree of similarity of two given trajec-
tories was assessed by specific distance functions. The clustering algorithm finds
groups of trajectories based on the similarity measures provided by the distance
functions. The same approach can be used for finding clusters of spatial events. In
this case, a distance function specific for spatial events needs to be used.
Density-based clustering, which was used in the previous chapter, is not the
only method suitable for detecting clusters of events. Bak et al. (2012) use a graph
clustering algorithm for this purpose. In data mining, there is a class of algorithms
for finding highly connected subgraphs in a graph (Aggarwal and Wang 2010). To
apply one of these algorithms, Bak et al. create a graph representing neighbour-
hood relations between spatial events. The nodes of the graph correspond to the
events. Two nodes are connected by a link if the events are neighbours, that is,
the distance between them does not exceed a given threshold. Bak et al. define
neighbourhood in terms of the spatial distance between events, since the goal is to
detect spatio-temporal clusters. However, a neighbourhood can also be defined in
other ways using appropriate distance functions.
Hence, irrespective of the type of clustering algorithm used (density-based
or graph-based), it is necessary to define an appropriate distance function that
212 6  Visual Analytics Focusing on Spatial Events

will tell whether two events are neighbours or not. First of all, the function must
take into account the spatial positions of the events: two events are neighbours if
they are close in space. For finding spatio-temporal clusters, it is also necessary
to take into account the temporal positions of the events: two events are neigh-
bours if they are close both in space and in time. It may be necessary to account
also for other attributes, in particular, movement direction, as Andrienko et al.
(2011b) do for finding traffic congestions in Milan: opposite movement directions
on the same street need to be distinguished as there may be congestion in one
direction and free movement in the other direction. Generally, a distance function
for clustering of spatial events must be able to take into account the spatial posi-
tions, temporal positions, and thematic attributes of events. This may be done as
described below.

6.1.1 A Distance Function for Spatial Events

The user selects the attributes of the events to be used for the clustering besides
their spatial positions. The latter are always used since the goal is to find spatial
clusters. Let s, a0, a1,…, aN be the attributes to be used in the clustering, where s
is the spatial position and a0, a1,…, aN are the user-selected attributes, which may
include the temporal position. Distances between events in terms of these attrib-
utes, denoted ds, d0, d1,…, dN, are computed as follows. The spatial distance ds is
the great-circle distance on the Earth for geographical coordinates and Euclidean
distance otherwise. The temporal distance is computed according to formula (6.1),
where t1 and t2 are the intervals of the existence of two events and tkstart and tkend
denote the start and end of an interval tk. The distance between values v1 and v2
of a thematic attribute with a nominal value scale is 0 if v1 = v2 and ∞ otherwise.
The distance between values v1 and v2 of a numeric thematic attribute is |v1 − v2|.
 start
 t2 − t1end if t1end < t2start
dt (t1, t2 ) = t1start − t2end if t1start > t2end (6.1)
0 otherwise

For cyclic attributes, such as time of a day, day of a week, month of a year,
or movement direction, distances cannot be determined by simply subtracting
one value from another. Thus, the values of movement direction range from 0 to
360 degrees but the value 360 means the same direction as 0. Hence, the distance
between 0 and 359 is 1 rather than 359. For dealing with any cyclic attribute, it is
necessary to know its cycle length, denoted V: Vdirection = 360°, Vtime of day = 24 h,
Vday of week = 7 days, and so on. The distance between two values is assessed as

|v1 − v2 | , |v1 − v2 | < V /2
d(v1 ,v2 ,V ) = (6.2)
V − |v1 − v2 | , otherwise
6.1  Extraction of Composite Spatial Events by Clustering 213

The distance function requires that the user specifies a vector of distance
thresholds <Ds, D0, D1,…, DN>, which defines the neighbourhood of an event in
the multi-dimensional space formed by the attributes s, a0, a1,…, aN. For exam-
ple, for spatial position, temporal position, and movement direction, the user may
choose the thresholds <100 m, 10 min, 20°>. This means that two events can be
treated as neighbours only if they lie no more than 100 m apart in space, no more
than 10 min apart in time, and their directions differ by no more than 20°. For non-
numeric thematic attributes, the distance threshold does not need to be explicitly
specified. Since the distance between two nominal values is either 0 or ∞, any
positive threshold will separate the two cases; hence, the exact threshold value is
not important. For the sake of definiteness and to avoid special treatment of non-
numeric attributes in the definition of the distance function, we assume the dis-
tance threshold for non-numeric attributes to be 1.
Besides the specific distance threshold for each spatial or numeric dimension,
the user specifies the way to transform the vector of distances <ds, d0, d1,…, dN>
into a single distance, which is required by the clustering algorithm. The distances
can be aggregated either by taking the maximum (a) or according to the formula
of Euclidean distance (b). Prior to the aggregation, the distances are divided by the
respective thresholds D0, D1,…, DN and multiplied by Ds, to become compatible
with ds.
Hence, the distance function computes the distance between two events accord-
ing to the following formula:


 ∞, if (ds >
 � Ds ) or ∃i|(di �> Di ), i = 0 . . . n
ds d0 dn

 Ds ∗ max Ds , D0 , . . . , Dn , if (a)


(6.3)


� �2 � n � �2
d d

s i
  Ds ∗ + , if (b)

 
 Ds Di
i=0

Option (a) defines the neighbourhood of an event as a cube in the multi-dimen-


sional space s, a0, a1,…, aN and option (b) as a sphere. In the latter case, two
events will not be treated as neighbours when the distances d0, d1,…, dN do not
reach the respective thresholds but are very close to them. Since this may be coun-
ter-intuitive, option (a) is preferable.
Distance function (6.3) can be directly used for clustering by means of a den-
sity-based algorithm such as DBScan (Ester et al. 1996) or OPTICS (Ankerst
et al. 1999). It can also be used for generation of a neighbourhood graph as an
input to a graph-based clustering algorithm (Aggarwal and Wang 2010).

6.1.2 Selection of Thresholds

Selection of thresholds is done based on analyst’s background knowledge of the


physics of the movement, properties of the space where movement takes place,
214 6  Visual Analytics Focusing on Spatial Events

characteristics of the data, and the goals of the analysis. Thus, the spatial and temporal
distance thresholds should be much lower for cars slowly moving in a traffic jam
than for landing aircraft. Suitable threshold values can be estimated using inter-
active visualizations of the extracted events. If the whole set of events does not
fit in RAM or cannot be efficiently handled by the visual and interactive tools, a
sample or subset of the events is used. Note that spatial concentrations of events
can be detected visually on a map where symbols representing events are drawn
in a semi-transparent mode (e.g. as in Fig. 4.14a). The user can interactively vary
the degree of transparency until the concentrations become visible. Then the user
can zoom into several selected concentrations and measure the spatial distances
from a few selected events to their third or fourth nearest neighbours. The maximum
of these distances will give a suitable approximate value for the spatial distance
threshold. In a similar way, the user can select the temporal threshold using a dis-
play of the events where one dimension represents time, for example, dot plot,
scatter plot, or space–time cube. The same approach can also be used to select
thresholds for other attributes; however, the semantics of the attributes often sug-
gests suitable values. For instance, when cars are moving close to each other on
the same side of a city street, the directions can hardly differ by more than 20°
since sharp curves are not usual for city streets.
Still, the initial selection of the thresholds may be not good enough. When the
thresholds are too high, the resulting clusters may be very large in space and/or
time, for example, a cluster of low-speed events of cars may stretch over several
streets and/or many hours. When the thresholds are too low, the clustering algo-
rithm will find only a few small clusters. Therefore, it is recommended to run clus-
tering several times. Depending on the results obtained, one of the thresholds is
increased (if the clusters are small) or decreased (if the clusters are large) by a
small amount, such as 10–25 % of the previous value. To enable a user’s evalua-
tion of the clusters, they are visualized on a map and in an STC. The user checks
if the cluster shapes and extents in space and time are consistent with task-spe-
cific expectations, for example, elongated narrow clusters with the duration from
20 min to several hours in searching for traffic jams.

6.1.3 Scalable Clustering of Events

As mentioned earlier, spatio-temporal clusters of events can be discovered using


density-based clustering algorithms (Andrienko et al. 2011b, 2013) or graph clus-
tering algorithms (Bak et al. 2012). In both cases, the neighbours of each event
in space and time need to be determined. In the case of a large number of events,
finding the neighbours may be a very time-consuming process. Our solution is to
perform a pre-clustering scan of the set of events in which lists of the neighbours
of all events are created and stored in the database, to be later retrieved on demand
in the course of the clustering. During the pre-clustering, only a small subset of
events needs to be present in RAM at each moment, which makes the approach
6.1  Extraction of Composite Spatial Events by Clustering 215

applicable to very large sets of events that do not fully fit in RAM. The approach
exploits the capabilities of database management systems and the properties of the
distance function.
According to formula (6.3), the distance between two events is infinite if
at least one of the distances ds, d0, d1, …, dN exceeds the respective threshold
from the multi-dimensional threshold vector <Ds, D0, D1,…, DN>. We decom-
pose the spatial dimension into two dimensions x and y or three dimensions x, y,
and z in the case of three-dimensional space. The distance ds can be within the
threshold Ds only if each of the distances dx, dy, and dz does not exceed Ds. An
event e can be represented by a tuple <c1e ,c2e , . . . ,cM e > where each component
ci ∈ {x,y,z,a0 ,a1 , . . . ,aN } is scalar. Let <D1, D2,…, DM> be the vector of distance
thresholds for the dimensions c1, c2,…, cM, respectively. We define the relevant
zone (RZ) of event e in dimension ci as the interval [cie − Di ,cie + Di ], where cie is
the value of dimension ci for event e. Let us consider the projection of all events
onto the dimension ci. By definition of the distance function (6.3), two events can-
not be neighbours if one of them is not contained within the RZ of the other. This
holds for any dimension c1, c2,…, cM. Hence, any event has a set of RZs, one in
each dimension, and each RZ contains all neighbours of the event. Consequently,
in order to find the neighbours of an event, it is sufficient to search in one of its
RZs. It is advantageous to use the RZ containing the smallest number of events.
Our approach exploits the RZs of the events for extracting their neighbours in a
scalable while efficient way. The key idea is that for a given event, only the events
from one of its RZ need to be present in RAM; moreover, as will be shown later,
one half of the RZ is sufficient. We shall use the terms lower relevant zone (LRZ)
and upper relevant
 zone (URZ) to refer, respectively, tothe lower andupper halves
of the interval cie − Di ,cie + Di : LRZ ≡ cie − Di ,cie and URZ ≡ cie , cie + Di .
 

Note that if event ei belongs to the LRZ of event ek then necessarily ek belongs to
the URZ of ei, and vice versa.
For pre-clustering, one of the dimensions c1, c2,…, cM is chosen for defining
RZs. The choice of the most suitable dimension will be discussed later on. The
events are sorted in the database in the ascending order of the values of the chosen
dimension cj and then processed in this order. The pre-clustering is done according
to the following algorithm:
216 6  Visual Analytics Focusing on Spatial Events

The lists of neighbours may be ordered by the distances. Such an ordering is


required, in particular, for density-based clustering algorithms. The lists of neigh-
bours of some events may be empty. In the following clustering, the events having
no neighbours are ignored, which increases the efficiency of the clustering.
At each iteration step of Algorithm 6.1, RAM contains events only from the
LRZ of the currently processed event ek but not from its URZ. However, the events
from the URZ will be loaded later and the list of neighbours of ek will be updated
when necessary while ek is still in RAM. Unloading of ek from RAM means that it
is outside of the LRZ of the next event. Due to the ordering, none of the following
events can be a neighbour of ek.
The work of Algorithm 6.1 is illustrated in Fig. 6.2. Events are represented by
points in two dimensions. The neighbourhood areas of the events are represented

Fig. 6.2  Illustration of event pre-clustering. The events, represented as two-dimensional points,


are ordered according to the X-dimension. The neighbourhood areas are represented by circles.
The lower RZ in each step is marked with a grey rectangle
6.1  Extraction of Composite Spatial Events by Clustering 217

by circles around the points. The horizontal dimension (X) is used for ordering
and defining the RZs. The LRZ of the currently processed event is marked in each
step by a grey-shaded rectangle. For events 1 and 2, the LRZs contain no other
events. For event 3, the LRZ contains event 2; however, the distance between 2
and 3 exceeds the threshold, that is, these events are not neighbours. Event 4 has
events 2 and 3 in its LRZ and its distances to these events are below the threshold.
Hence, the lists of neighbours of all three events 2, 3, and 4 are updated. The lists
of events 2 and 3 include 4 and the list of event 4 includes 2 and 3.
There is a special case when a cyclic attribute, such as direction, is chosen for
defining RZs of events. When events are ordered based on the values of a cyclic
attribute, the LRZs of the first events in the sequence may contain events that are
positioned at the end of the list. When the last events are processed, the first events
are no longer in RAM. Hence, the first events cannot be attached to the neighbour
list of the last events and their own neighbour lists cannot be updated. To deal with
this problem, Algorithm 6.1 is modified as follows.
Let Acyc be the cyclic attribute used for event ordering, V its cycle length, D
the distance threshold, and v0 the value of Acyc for the first event in the list e0.
Before starting Algorithm 6.1, all events from the end of the list that belong to the
LRZ of e0, that is, {ek |vk ≥ v0 + V − D}, are pre-loaded to RAM and marked as
“transient”. The special mark distinguishes them from the other events that will be
loaded during step 1 of Algorithm 6.1. Let et1 denotes the first transient event from
all transient events ordered according to Acyc.
Before unloading an event ei at step 2.1, it is checked if ei is marked as “tran-
sient”. If so, it is unloaded with its neighbour list to a special buffer B, otherwise
to the database D′, as usual.
When at step 1 the event et1 and its successive events, that is, the last events
from the list, are loaded in RAM, their neighbour lists are initialized with the lists
stored in the buffer B. Since B has the same order of the incoming events, the
retrieval of the lists is performed in constant time.
Let n be the number of events in the input dataset E and N the maximum
number of events in the LRZ of one event. The complexity of Algorithm 6.1 is
O(n · log n + n · N), where n · log n time is used for the ordering and n · N for
checking all events against the events from their LRZs. The complexity may be
quadratic in the worst case when N = O(n), that is, one event contains almost
all other events in its LRZ. This means that the distance threshold for the chosen
ordering dimension is close to the whole range of this dimension. In most cases,
however, the ordering dimension can be chosen so that N ≪ n, which reduces
the complexity to Θ(n · log n). Using the capabilities of DBMS, it is possible to
retrieve the sequence of events already sorted according to the chosen dimension.
In this case, the sorting cost is externalized to the database and the resulting com-
plexity is reduced to Θ(n).
The choice of the most suitable ordering dimension is based on two criteria.
First, the average number of events in a LRZ of one event should be as small as
possible. Second, the maximal number of events in a LRZ should not exceed
RAM capacity. In order to estimate the average and maximal number of events in
218 6  Visual Analytics Focusing on Spatial Events

a LRZ in each dimension, a frequency histogram for each dimension is built using
the database facilities. The appropriate width of the histogram bin is one half of
the distance threshold for this dimension.
Still, even an optimal choice of the ordering dimension cannot guarantee that
the events that need to be processed at each step of Algorithm 6.1 will always fit in
RAM. If at some step the events from the LRZ of the currently processed event do
not fit in RAM, they are offloaded to an external buffer and the process continues
using this buffer instead of RAM. The overhead of accessing the data on the disc
will decrease the performance, but the buffer removes the constraint on the mem-
ory size. As soon as the size of the buffer content becomes suitable for RAM, it is
moved to main memory for more efficient access.
After the pre-clustering phase is concluded, clusters of events are determined
by means of the clustering algorithm. The algorithm requires the extraction of the
neighbours of each event, which has already been done. Hence, the list of neigh-
bours just needs to be loaded from the database. The memory complexity of the
clustering is bounded by the maximum number of neighbours of one event.

6.1.4 An Example of Scalable Clustering of Spatial Events

We shall demonstrate event clustering by example of the Milan cars data. The
analysis problem is to determine the places and times where traffic jams occurred
in the city during the whole week for which we have the data. An indication of
possible traffic congestion in a place is when cars in this place move with low
speed. Hence, for detecting congestions, we need to extract low-speed movement
events from the car trajectories. Such movement events may be defined as the
points of the trajectories for which the speed is below a certain threshold. We use
the threshold of 10 km/h based on our background knowledge that speeds below
10 km/h are very low for cars and that much higher speeds are typically allowed
even on small streets in a city. With this threshold, we extract 251,588 movement
events. A subset of these events has been shown in Fig. 4.14a and b. It is impor-
tant to note that the extracted events inherit all attributes of the trajectory points
from which they have been generated, in particular, the direction of the movement
(measured in degrees from 0 to 360), which will be used in our further analysis.
Not all movement events that we have extracted may correspond to traffic jams.
Low-speed values may also have other reasons such as waiting for a green traf-
fic light or parking. Therefore, we are interested only in the cases when multiple
low-speed events with similar movement directions co-occur in space and time.
The coherence in direction is important for two reasons: first, for distinguishing
between traffic jams occurring on opposite lanes of the same street or on two or
more crossing streets; second, for disregarding “false movements”, when a car is
standing in the same place but its GPS sensor records a slightly different position
in each time step due to measurement errors. In such a case, the movement direc-
tions within a sequence of trajectory points vary greatly.
6.1  Extraction of Composite Spatial Events by Clustering 219

We shall discover co-occurrences of multiple similar events in space and time


by means of density-based clustering. In our example, we consider two low-speed
movement events as “neighbours” in terms of their spatio-temporal positions and
movement directions when the distance in space between them is up to 100 m, the
distance in time is up to 10 min, and the difference in the directions is up to 20°.
The clustering method also requires specifying the minimum number of neigh-
bours a point must have for being treated as a core point of a cluster. We set this
parameter to five neighbours.
The variations of the sizes of the RZs for the dimensions longitude (X), lati-
tude (Y), time (denoted T), and direction (denoted D) are shown in the graphs in
Fig. 6.3. The horizontal dimension represents the ordered sequence of events, and
the vertical dimension represents the sizes of the RZ of the events. In the upper
graph, the vertical axis is scaled linearly, and in the lower graph, it is scaled loga-
rithmically. For the scalable pre-clustering, we take the X-dimension (longitude)
to define the RZs of the events since it gives the smallest average and maximal
sizes of the RZs (1,137 and 2,517, respectively). The worst choice would be the
direction, for which the average size of the RZ is 15,539 and the maximal size is
32,579. The use of this dimension would increase the computation time by the fac-
tor of 20 as compared to the use of the longitude. As the ratio between the direc-
tional distance threshold (20) and the attribute value range (360) is quite large, the
LRZ of each event contains too many events that need to be checked in step 2 of
Algorithm 6.1.
The clustering by the spatio-temporal positions and directions produces 1,908
clusters containing 34,361 events (13.7 % of the extracted events); the remain-
ing events are treated as noise. The clusters represent probable traffic jams,
when multiple cars move with low speeds, and the noise includes occasional

Fig. 6.3  The graphs depict the variation of the RZ sizes for the low-speed events in Milan
ordered according to longitude (red), latitude (blue), time (green), and direction (brown). Top the
vertical axis is linear. Bottom the vertical axis is logarithmic
220 6  Visual Analytics Focusing on Spatial Events

low-speed events, which are not relevant for us. We exclude the noise from the
further consideration. The space–time cube (STC) in Fig. 6.4 shows the clus-
ters represented by spatio-temporal envelopes enclosing the cluster members.
Geometrically, an envelope is a prism with two parallel faces positioned in the
time dimension according to the beginning and end times of one cluster and hav-
ing the shape of a polygonal convex hull built around the spatial positions of the
cluster members. The side faces are rectangles connecting corresponding edges of
the parallel faces. For illustrative purposes, the envelopes are coloured according
to the median movement directions (i.e. headings) of the cars transformed to nom-
inal (textual) values corresponding to eight compass directions. This facilitates
visual separation of clusters differing in the movement direction. Distinct colours
are assigned to the compass directions. The STC shows that clusters (i.e. traffic
congestions) often occur repeatedly in the same places. These clusters appear one
above another in the cube.
Our next task is to delineate all places in the city where traffic jams occurred.
For this purpose, we need to unite the spatio-temporal clusters that are disjoint in
time but co-located in space. This can be done using the same density-based clus-
tering method, but at this stage, the events are clustered according to their spa-
tial positions and movement directions irrespective of the temporal positions. We

Fig. 6.4  The space–time cube shows the spatio-temporal envelopes enclosing 1,908 spatio-tem-
poral clusters of low-speed movement events with similar directions coloured according to the
mean direction of the movement
6.1  Extraction of Composite Spatial Events by Clustering 221

Fig. 6.5  The places of traffic jams have been defined as spatial buffers around spatio-directional
clusters of low-speed movement events

use the same parameter settings as before, but omit the threshold for the temporal
distance. Note that the second clustering is applied only to the subset of events
remaining after excluding the noise.
The second clustering produces 269 spatial clusters. We generate spatial buff-
ers around the clusters and thereby delineate the places of traffic congestions. The
places located in the northern part of Milan are shown in Fig. 6.5. The interiors are
coloured according to the median movement directions transformed to text labels,
as in Fig. 6.4.
This example demonstrates the use of clustering techniques for extracting
collective movement events (traffic congestions) from a set of individual move-
ment events (low-speed events). It also demonstrates that clustering may be used
for finding places where spatial events of a certain type (e.g. traffic congestions)
have repeatedly occurred. Another example of such use of clustering was given in
Sect. 1.2, where we discovered personal places of interest: home, work, and places
of shopping.

6.2 Characteristics

Elementary spatial events are often defined or can be treated as points in space.
Such spatial events (point events) are traditionally visualized on a map by point
symbols, such as circles, placed in the display according to the spatial or spatio-
temporal positions of the events. Visual properties of the symbols, that is, sizes,
colours, etc., can represent thematic or temporal characteristics of the events.
Instant point events, that is, point events having no duration in time (the dura-
tion may be negligibly small or irrelevant for the analysis), can also be repre-
sented by point symbols in temporal and spatio-temporal displays, in particular,
in a space–time cube. Examples can be seen in Figs. 3.2, 4.14, and 5.43. Point
events extended in time can be represented in these displays by lines or bars.
222 6  Visual Analytics Focusing on Spatial Events

When spatial events are extended in space and their shapes and/or spatial extents
are relevant for analysis, the events are shown as area objects on a map and as
volume objects in a space–time cube, like the traffic congestion events repre-
sented by spatio-temporal convex hulls in Fig. 6.4 as well as Figs. 4.14 and 4.15.
Colouring of the objects can encode values of thematic attributes; thus, the col-
ours of the convex hulls in Fig. 6.4 represent the movement directions. Values of
multiple thematic attributes can be represented on a map by diagrams, but this can
be effective only for a small number of spatial events that are scattered over the
territory rather than spatially clustered. A few other methods for visual represen-
tation of events within or out of the spatial context can be found in the survey by
Aigner et al. (2011).
When multiple spatial events have the same or very close positions in space, it
may be problematic to visualize them on a map as individual events due to occlu-
sions of the symbols representing them. A common approach is to aggregate the
events in space and represent groups of events by symbols or diagrams show-
ing the sizes of the groups and, possibly, other aggregate characteristics. This
approach was applied in Fig. 5.44.
For collective movement events and other composite spatial events, it may be
necessary to apply specific visualization techniques. For example, in Fig. 5.29,
encounter events are represented by lines connecting corresponding points from
two trajectories. The same technique can be used in a space–time cube (Fig. 6.6a).
The types of encounters can be conveyed through colouring of the connecting
lines (Fig. 5.29). This approach, however, represents each occurrence of proximity
between two points as unrelated to all others and is therefore insufficient for rep-
resenting encounter events that are extended in space and/or time and involve two
or more points from each trajectory. The visualization in Fig. 6.6b overcomes this
deficiency using two techniques in addition to connecting corresponding points.
First, the parts of trajectories involved in an encounter event are marked using

Fig. 6.6  Encounters of two ships are shown in a space–time cube (a) and on a map (b)
6.2 Characteristics 223

colours distinct from the colour of the whole trajectory lines. Thus, in Fig. 6.6b,
the trajectories are shown in magenta while the parts involved in encounter events
are marked in red and blue. Second, bounding rectangles drawn around the spa-
tial footprints of the events also visually unite pairs of connected points belong-
ing to the same event. The rectangles facilitate finding the events on a small-scale
map showing a large territory. By zooming in and applying various filters (spatial,
temporal, and filter of related sets, to select only trajectories involved in currently
explored events), the analyst can visually investigate the internal structure of com-
posite events.
We extend the variety of existing methods for visualizing characteristics of spa-
tial events by two non-traditional methods, which are designed for representing
groups of co-located spatial events, in particular, spatial and spatio-temporal clus-
ters of spatial events. The methods do not involve computational aggregation of
events. The first method represents each individual event by a pixel and applies
a special method for pixel placement to avoid occlusions. Temporal or thematic
characteristics of the events are represented by colours of the pixels. The second
method exploits transparency to produce visual summaries of groups of events in
the form of complex symbols (diagrams) where elements representing individ-
ual events may be overlaid. Temporal and thematic characteristics of the events
are represented by relative positions of the diagram elements and their visual
properties.

6.2.1 Growth Ring Maps

To describe the growth ring map visualization technique (Bak et al. 2009), which
represents spatial events by coloured pixels, we shall use the example dataset rep-
resenting the movement of laboratory mice introduced in Sect. 2.10.9. The prin-
cipal research question is whether there is a significant difference between the
behaviour patterns of healthy mice and Alzheimer-transgenic mice, which carry
the Alzheimer disease.
The experiment in which the data were collected was carried out in a multi-
level cage equipped with 27 strategically placed RFID receptors that log mice
presence when they come closer than 3 cm to the receptors. This kind of logging
results in episodic movement data, which can be used to approximate lower bound
properties of the actual movement since more detailed data are unavailable. At the
same time, each log record can be viewed as representing an event of a mouse
presence at a sensor. We limit the time span under analysis to the first 90 days of
each mouse’s life in the cage. This is reasonable since the average life expectancy
was 86.6 days (StDev = 48.1) for a healthy mouse and 79.3 days (StDev = 29.6)
for a transgenic mouse.
To visualize the spatial properties of mice’s behaviour, the three-dimensional
locations of the sensors were mapped onto two-dimensional screen coordinates
so as to preserve the relative spatial positions of the sensors. There are some
224 6  Visual Analytics Focusing on Spatial Events

important semantic differences between the sensor locations. Water places need
to be distinguished from the other locations, and the latter ones can be further
divided into ground floor and higher floor locations. These three semantic classes
were mapped to colours using a ColorBrewer’s three-level qualitative colour
scheme (Harrower and Brewer 2003). Water places were mapped to blue, ground
floor locations to green and higher floor locations to orange, as shown in Fig. 6.7a.
The temporal properties of the mice’s behaviour were assessed based on the days
in which particular sensors were visited. This property was mapped onto colour
gradients. Light, unsaturated colours indicate early dates, and intense, saturated
colours indicate later days of visits, as shown in Fig. 6.7a.
Each event of a mouse coming close to a sensor location is represented by one
pixel, which is placed on the background map representing the cage layout. The
colour of the pixel is determined by the location semantics and the time of the
visit, as shown in Fig. 6.7a. For each location, the pixels representing the respec-
tive events are arranged in a circular layout called the growth ring. The sensor
location is taken as the centre of the growth ring. The pixels are placed around
the centre in an orbital manner, sorted by the event times: the earlier the time, the

Fig. 6.7  Colour hue represents place semantics and saturation the times of the place visit events
(a). The events are represented by pixels placed around the sensor locations (b). Different scaling
of the colour saturation gradients can enhance different time intervals of the visits (c). a Colours
represent semantic properties of the sensor locations, and colour gradients correspond to the
times of the visits. b Visits of a sensor location are represented by pixels placed in a circular
layout around the location. The sizes (areas) of the circles are thus proportional to the numbers of
the visits. c Scaling of the colour saturation gradients can enhance different time intervals of the
visits. Square-root scaling provides equal perceptual weights to all times
6.2 Characteristics 225

closer the pixel is to the central point. The size of the layout indicates the overall
number of visits at the sensor. Figure 6.7b schematically shows few (625), medium
(1,250), and many (2,500) visits.
The implementation of the circular arrangement is simple and efficient. The
positions of the points on a circle with a given radius r around a central point p
are calculated using a modified version of the Bresenham Midpoint algorithm
(Bresenham 1977). It is modified so as to calculate the positions on a circle with
the line width of two pixels. This is done because with the standard Bresenham
algorithm, where the line width is one pixel, the circle with the radius r + 1 does
not touch every pixel on the circle with the previous radius r. This means that
undesirable gaps may appear in the growth rings.
When two or more neighbouring growth rings are about to overlap, the lay-
out algorithm displaces the pixels in such a way that none of them are covered by
another pixel. Hence, when there are many visits to neighbouring sensor locations,
the corresponding growth rings will not have perfectly circular shapes but will be
distorted.
The pixel placement is done by the algorithm rearrangeDataObject (Fig. 6.8).
In order to quickly find a proper position for each pixel, we store the radius r that
was used for the last placement; the initial value is 1. The calcCirclePoints method
returns the points of the circle ordered by their distance from the last pixel posi-
tion. These are candidate positions for placing the next pixel. Each of them needs
to be checked for already containing some pixel (from another ring). If a position
is already occupied, it is skipped, and the next position is checked. This is done
until either a free position is found or there are no positions left on the circle with
the current radius. In the latter case, the circle radius is increased, and a new set
of candidate positions is calculated, which are checked as described above. After
a suitable position for the current pixel is found, the circle radius that was used is
stored for accelerating the following placement operations.

Fig. 6.8  Pseudo-code of
the algorithm for circular
placement of pixels around a
centre point
226 6  Visual Analytics Focusing on Spatial Events

Due to the concentric mapping, the same number of events that occurred at the
beginning and at the end of the time period is represented by different numbers
of circles. The earlier events are represented by a larger number of circles with
smaller radii and the later events by a smaller number of circles with larger radii.
The total width of the many smaller circles is larger than the total width of the few
larger circles. As a result, the earlier events represented by light, unsaturated col-
ours receive more visual prominence than the later events. We counterbalance this
effect by colour scaling. To demonstrate the effect of the scaling in Fig. 6.7c, we
generated three growth rings for a time period of 100 days with a uniform distribu-
tion of the events over the days (namely, 25 visits per day, resulting in 2,500 pixels
in total) using different colour-encoding functions. Linear encoding of the colour
gradients gives more prominence to the light colours in the inner growth rings.
Logarithmic scaling enhances the darker colours in the outer growth rings. Square-
root scaling results in a homogeneous smooth gradient of colour intensity creating
a perception of a uniform distribution and should therefore be applied to overcome
the bias inherent to the pixel placement method.
Although this visualization method represents every individual event by a pixel
and does not involve data aggregation, the resulting layout of the pixels induces
perceptual aggregation: the user does not distinguish individual pixels but per-
ceives them all together as one figure. The size of the whole figure shows the total
number of events while the variation of its colour intensity gives an overall idea
about the temporal distribution of the events. Thus, in the example with the mice, a
prevalence of light shades indicates that a place was more visited at the beginning
of the time period but later was rarely or never visited. A prevalence of saturated
shades indicates a place that was more frequently visited at the end of the time
period than at the beginning.
An example of a growth ring map is shown in Fig. 6.9. The map represents the
behaviour of a healthy male mouse. In this and following examples, the sensors
with the visit counts being less than 20 % of the maximum among all sensors are

Fig. 6.9  Examples of growth
rings representing different
temporal patterns of place
visits
6.2 Characteristics 227

disregarded for reducing the display clutter and increasing the legibility. Several
temporal patterns of place visits represented by different visual properties of
growth rings can be found in this example:
(a) A large number of visits at a higher floor sensor (orange) predominantly at the
end of the time period, indicated by the prevalence of saturated colours.
(b) Continuously high number of visits at a lower floor sensor (green), resulting
in a large growth ring with different levels of colour saturation present.
(c) A high number of visits at a watering place (blue) in the beginning of the
time period (light colours of the inner growth rings) and at the end of the time
period (saturated colours of the outer growth rings).
To determine behavioural differences of the mice as a function of gender and
health condition, we represent the data for each mouse in a separate growth ring
map. In Fig. 6.10, the growth ring maps of four animals are compared. The colours
of the left and right borders of the maps indicate the gender (blue for male and

Fig. 6.10  Growth ring maps representing movement patterns of four selected animals; upper
row: male, lower row: female, left column: healthy, and right column: transgenic
228 6  Visual Analytics Focusing on Spatial Events

pink for female) and the colours of the top and bottom borders the health condition
(green for healthy and red for transgenic). The patterns that can be observed in
these example maps are common for the respective groups of mice.
Territoriality, that is, tendency to stay within a limited area, is a typical behav-
iour of male mice. This type of behaviour can be clearly seen in the upper left
image corresponding to a healthy male mouse. There are few very frequently vis-
ited places, while most other places were almost never visited; the latter appear as
tiny dots on the map. The mouse spent most of its time at the sensor close to the
bottom left corner. The presence of all shades of orange indicates that the place
preference did not change over time.
While territoriality is a predominant pattern for healthy male mice, it is not
observed in the behaviour of transgenic male mice, as can be seen from the exam-
ple growth ring map on the top right. A behaviour opposite to territoriality can be
seen in the growth ring maps of female mice (bottom row), who appeared in all
compartments of the cage. However, in the bottom left image, some growth rings
are larger than others as an indication of preferred habitation of the mouse.
Visits of the watering places are represented by growth rings coloured in blue.
It is noticeable that the healthy male mouse (top left) tends to focus on one par-
ticular watering place, whereas the healthy female mouse frequently visits sev-
eral watering places. Note that the territorial behaviour and the preferred watering
place appear to be closely linked. The patterns of the transgenic mice are differ-
ent. The male mouse (top right) visited three different water places and the female
mouse (bottom right) had a single preferred water place, which was visited much
less often as compared to the healthy mouse.
We can also see temporal changes in the movement behaviours of the healthy
and transgenic mice. It is noticeable that there is an overall tendency for the
growth rings of the healthy mice to have a larger variety of colour shades than
the ones of the transgenic mice. The distribution of the shades within the growth
rings of the healthy mice is close to uniform. This means that the healthy mice had
approximately the same level of movement activity over the whole observation
period. The transgenic female mouse tended to move more by the end of the three-
month period, which is indicated by high proportions of dark shades in the growth
rings. For the transgenic male mouse, an opposite pattern is observed.
Another example of using colour hues and lightness/saturation gradients in
a growth ring map is demonstrated in the paper by Andrienko et al. (2011a). An
example map represents visits of different places in Switzerland by Flickr users.
Four different hues are used to represent the seasons of a year. Different years are
mapped to five different lightness/saturation levels of the colours assigned to the
seasons. Pale colours represent earlier years and bright colours more recent years.
The resulting growth ring map demonstrates different temporal patterns of place
visits with respect to both linear time (years) and cyclic time (seasons). It is pos-
sible to recognize places where people appeared in all seasons of all years, places
mostly visited in one season of different years (e.g. in winter), places that were
intensively visited only in one season of one year, and places that were visited dur-
ing certain time intervals within the observation period.
6.2 Characteristics 229

Generally, the growth ring map is a suitable tool for investigating spatio-temporal
distribution of multiple events occurring at a limited number of places. The places
should not necessarily be known in advance but can be determined by means of spa-
tial clustering of the events, as will be discussed later. In principle, the colouring of
the pixels can represent not only the times of event occurrences but also other char-
acteristics of the events.

6.2.2 Flower Diagrams

Flower diagrams are another approach for visualizing of groups of co-located spa-
tial events. It is intended to support the investigation of cyclic temporal patterns
of event occurrences in different places. This visualization has been inspired by
Florence Nightingale’s rose diagrams (Brasseur 2005), also known as polar area
diagrams, which she created to represent graphically the mortality causes during
the Crimean War (October 1853–February 1856). A rose diagram is composed of
sectors radiating from a common centre, like petals of a flower. All sectors have
the same angular size but may differ in their lengths and, consequently, areas. In
the Nightingale’s rose diagrams, there are 12 sectors corresponding to the months
of a year. The area of a sector is proportional to the number of deaths in a month.
The sectors are subdivided into segments by causes of deaths. The diagrams
helped Florence Nightingale persuade the British government to institute reforms
for improving sanitary conditions in the army.
More generally, rose diagrams are used to represent various cyclic phenomena.
The sectors correspond to different positions in a cycle. This may be a temporal
cycle (daily, weekly, or yearly) or a cycle of another nature, for example, the cycle
formed by all possible wind directions (rose diagrams based on this cycle are used
in meteorology). Numeric values associated with the positions in the cycle are rep-
resented by either the lengths or the areas of the sectors. Rose diagrams are quite
suitable for being drawn on a map. Thus, we have used a map with rose diagrams
in Fig. 5.44 to represent counts of spatial events in different places. This map dem-
onstrates a typical use of rose diagrams, when the sizes of the sectors represent
aggregates, such as counts.
In this section, we describe a visualization technique which uses the rose dia-
gram layout to plot individual events rather than computationally derived aggre-
gates. However, the layout induces perceptual aggregation of the information,
analogously to a growth ring map. We shall call our method the “flower diagram”
to stress both similarity to and difference from the standard rose diagram. The idea
of the method is illustrated in Fig. 6.11.
As in a standard rose diagram, there are sectors corresponding to positions in a
temporal cycle. In Fig. 6.11b, these are hourly intervals of a day. Each individual
event is represented by a sector positioned in the diagram according to the posi-
tion of the event in the temporal cycle. When several events occurred during the
same interval in the cycle, the sectors representing them will overlap. The sectors
230 6  Visual Analytics Focusing on Spatial Events

Fig. 6.11  A schematic description of visual mapping reveals the timing, temporal distribution,
and duration of events: a visual attributes and b exemplary result for one group of events

are drawn in a semi-transparent way. Overlaying of several sectors increases the


opacity of the resulting figure. Hence, the level of opacity of segment filling is
proportional to the number of events that occurred in the corresponding interval of
a cycle. Additionally, the length of a sector can represent values of some numeric
attribute of the events, such as duration. As a result, the level of opacity of the fill-
ing within a segment may vary: the closer to the centre, the higher the opacity.
In the overall flower diagram, the presence or absence of the petals (i.e. sectors)
and the variation of their lengths and opacity levels convey the temporal distri-
bution of the event occurrences and their durations over the represented temporal
cycle.
Flower diagrams are drawn on a map at the spatial positions of groups of
co-located spatial events. The following example demonstrates the use of flower
diagrams for the exploration of the spatial and temporal distribution of the stops
of public transport in Helsinki (Finland). The stop events have been previously
extracted from the trajectories of public transport vehicles. The original data-
set is described in Sect. 2.10.4; the possible approaches to detecting stop events
in trajectories are discussed in Sect. 3.5. Spatial clustering has been applied
to the extracted stop events. The geographical centres of the event clusters have
been taken as stop places. Flower diagrams are drawn on a map in these places
(Fig. 6.12); each diagram represents one spatial cluster of stop events.
Several flower diagrams with long segments radiating in different directions are
readily seen in the map. Most of them correspond to stop places located at the final
stations of the public transport routes. The stops in these places are usually long.
The temporal distribution is uneven throughout the day and differs from place to
place. It is also noticeable that singular stops occur almost at every street junction.
They are signified by diagrams with one or two sectors, some of them being quite
6.2 Characteristics 231

Fig. 6.12  The spatio-temporal distribution of the public transport stops in Helsinki is repre-


sented by flower diagrams placed on a map of Helsinki in the locations of the stops. The longest
sector corresponds to the duration of 15 min

long, indicating long stop durations. The directions of the segments show in which
hours the stops occurred.
To demonstrate the use of flower diagrams in more detail, we shall discuss two
diagrams located near the central railway station of Helsinki. An enlarged frag-
ment of the map containing these diagrams is shown in Fig. 6.13. We see that in
both places there were many long stops and that the stop durations greatly varied
not only over the day but also within the hourly intervals. In place A, the longest
stops and largest variations of the stop durations occurred in the morning between
7 and 10 o’clock and in the evening after 20 o’clock. In place B, the longest stops
and largest duration variations are observed in early afternoon, between 12 and 16
232 6  Visual Analytics Focusing on Spatial Events

Fig. 6.13  Flower diagrams in two stop places at the central railway station

o’clock. This suggests that these two places, being 1 km apart, may share the load
over the day. Bak et al. (2012) also discuss other observations about the function-
ing of public transport in Helsinki made with the use of flower diagrams.

6.2.3 Textual Characteristics of Composite Events

When composite spatial events, such as spatio-temporal clusters, are constructed


from elementary spatial events, the composite events are characterized by attrib-
utes that are derived from attributes of the elementary events. Thus, the spatial
position of a composite event may be defined as a convex hull, spatial buffer,
or bounding rectangle containing the spatial positions of all elementary events
included in the composite event. The temporal position of a composite event is the
minimal time interval containing the temporal positions of all elementary events.
Composite spatial events are also characterized by thematic attributes that are
derived by statistical summarization (aggregation) of values of thematic attributes
characterizing the elementary spatial events. Values of numeric attributes can be
6.2 Characteristics 233

summarized by computing the mean, minimum, maximum, quartiles, mode, and


other statistics. Thus, in Sect. 6.1.4, the movement directions in the traffic jam
events were determined by computing the mean of the movement directions of the
elementary low-speed events.
While it is quite obvious how to derive, visualize, and analyse numeric sum-
mary attributes of composite spatial events, characterizing composite events on
the basis of qualitative thematic attributes of their elementary events may be more
difficult, especially when the values of those attributes are arbitrary, unstructured
texts. In this section, we shall demonstrate some techniques for dealing with such
attributes. A suitable example dataset is the data about the geographically refer-
enced Flickr photos introduced in Sect. 2.10.6. The data include user-given titles
of the photos. For many photos, the titles are missing or coincide with the image
file names; still, there are many records with informative photo titles mention-
ing the objects captured in the photos. The data can be viewed as spatial event
data, where each record describes an elementary spatial event of photo taking. The
photo title is a qualitative thematic attribute of an event. Its values may be arbi-
trary texts (moreover, the texts may be in different languages, but here, we limit
our analysis focus to texts in English).
In our example analysis, we take a subset of the Flickr photo data referring to
Switzerland and the surrounding territories. The subset contains records of about
1,300,838 photos made by 34,141 Flickr users. Our analysis goal is to discover pop-
ular spatial objects and events, that is, those that attract many Flickr users and are
captured in many photos. To achieve this goal, we shall fulfil two subtasks. First,
we shall discover spatio-temporal concentrations (clusters) of Flickr users. Such a
cluster indicates that something has attracted public attention. Second, we shall try
to interpret the discovered clusters, that is, composite spatial events of several people
being close in space, by analysing the titles of the photos fitting in the clusters.
Spatio-temporal clusters of photo-taking events can be discovered by means
of density-based clustering similarly to the detection of traffic jams in Milan
(Sect. 6.1.4). This time we shall group the events only by their proximity in space
and time (for detecting the traffic jams, the movement direction was also taken into
account in the distance function). However, we do not want to apply clustering to the
whole set of photo records because we know that sometimes a single Flickr user pub-
lishes a lot of photos with the same geographic reference and close temporal refer-
ences. Such a package of photos is by itself a spatio-temporal cluster; however, it does
not necessarily correspond to an object or event of public interest. To exclude packages
of co-located photos from singular users, we pre-process the Flickr data as follows.
First, we build the trajectories of the Flickr users by connecting the photo-
taking events of each user in chronological order. Then, we apply density-based
simplification to the trajectories (Sect. 3.7), which replaces groups of consecutive
trajectory points fitting within a circle of a specified radius and a time interval of a
specified maximal duration by a single point. The time reference assigned to this
point is the minimal time interval including all the points within the circle. Hence,
when multiple photo-taking events of the same user are co-located in space and
time, the simplification replaces them by a single event indicating the presence of
234 6  Visual Analytics Focusing on Spatial Events

this user at a certain point (within a given distance) during a time interval. This
single event is used in the clustering instead of the original multiple events.
In our example study, we do the simplification with the circle radius 250 m and
maximal duration 30 min. The maximal number of co-located photos of one user
that has been replaced by a single event is 317. There are 16,639 events (3.6 %)
representing 10 or more photos and 153 events (0.03 %) representing 100 or more
photos. Besides removing potentially uninteresting bunches of photo records from
same users, we gain significant data reduction: the total number of elementary spa-
tial events decreases from 1,300,838 to 466,309 (36 % of the original number).
This decreases the time required for the clustering.
We apply density-based clustering with the following parameters: spatial dis-
tance threshold 250 m, temporal distance threshold 30 min, and minimal number
of neighbours of a cluster core object is 3. As a result, we obtain 3,206 spatio-tem-
poral clusters including in total 19,125 events (4.1 %); 447,184 events (95.9 %)
are classified as “noise”, that is, isolated events not belonging to any cluster. It
should be noted that the simplification procedure does not preclude finding spatio-
temporal clusters consisting only of events of singular users. When a user takes
several photos while moving, so that all photos do not fit into a circle of the radius
taken for the simplification, the sequence of photos will be preserved by the sim-
plification procedure and participate in the subsequent clustering. Some photos in
this sequence may have enough neighbours to be selected as core points of a clus-
ter. Their neighbours will also be attached to the cluster. This cluster may remain a
single-user cluster if there are no photos of other users in the neighbourhood.
In our example, 2,481 clusters out of 3,206 (77.4 %) are single-user clusters
and only 725 clusters (22.6 %) include photo-taking events of two or more users.
The maximal number of users in a cluster is 19.
The spatio-temporal clusters we have obtained are transformed into composite
spatial events; we will call them “concentration events”. The spatial boundaries of
the events are defined by generating spatial buffers around the clusters. The tem-
poral position (life time) of each concentration event is the time interval between
the earliest start time and the latest end time among the events in the cluster. The
maximal duration of a concentration event is 576 min (9.6 h); the median and
mean durations are 65.4 and 83.2 min, respectively.
The map in Fig. 6.14 shows the spatial positions of the concentration events.
The events are represented by their spatial boundaries (in blue) drawn with 40 %
opacity. Brighter blue colours emerge where many boundaries overlap, like in the
area of Zurich, which is shown in an enlarged map fragment in Fig. 6.14b.
To characterize the concentration events based on the available photo titles, we
run a procedure that performs the following operations for each event:

1. Retrieve from the database the photo records fitting within the spatial boundary
of this event and the time interval of its existence.
2. From the titles of the retrieved photo records, extract the words and count their
frequencies. Ignore stop words, such as articles, prepositions, conjunctions,
pronouns, etc.
6.2 Characteristics 235

Fig. 6.14  A total of 3,206 concentration events created from spatio-temporal clusters of photo-
taking events are shown on a map (a whole territory; b a fragment enlarged). The boundary lines
of the concentration events (in blue) are shown with 40 % opacity

3. Take the words having frequencies not less than a user-given threshold (we take
the threshold 3), find word sequences where these words occur together with
1, 2, 3, and 4 other words (disregarding stop words), and count the frequencies
of the word sequences.
4. Attach the list of words and sequences having the frequencies not less than
the threshold to the concentration event as the value of the attribute “textual
annotation”.
5. To facilitate further analysis of the frequent words and combinations, create for
each of them a text event, that is, an elementary spatial event with the same
spatial and temporal positions as the concentration event and the following the-
matic attributes: cluster identifier, text (i.e. the word or word combination), fre-
quency, and word count.
For practical reasons, it is more convenient to represent the text events by
points in space rather than by areas. Therefore, the spatial positions of the text
events may be defined not as the areas covered by the corresponding concentra-
tion events but as the centres of these areas. The set of text events can be shown
on a map as a new layer. It can also be visualized in a space–time cube. The attrib-
utes of the text events can be used for filtering and in visual and/or computational
analysis.
In our example, the procedure of text retrieval and summarization was able to
find 7,285 frequent words and combinations for 1,426 out of 3,206 concentration
events, that is, for 44.5 % of the events. The text events representing these words
and combinations are shown as points on a map in Fig. 6.15. The point symbols
are coloured according to the spatial positions (we used k-means to divide the set
of points into 20 groups based on their longitudes and latitudes and assigned dif-
ferent colours to the clusters). The colouring is used as a means of linking between
236 6  Visual Analytics Focusing on Spatial Events

Fig. 6.15  The text events are represented by point symbols drawn with 30 % opacity. The sym-
bols are coloured according to the spatial positions of the events. The arrow points on a currently
highlighted text event corresponding to the mouse position in the text cloud display (Fig. 6.16).
The text of the event is “Interlaken Red Bull air race”

Fig. 6.16  The texts of the text events are shown in a text cloud display. The texts have the
same colours as the point symbols representing the events in the map (Fig. 6.15). The texts are
arranged in the order of decreasing frequency

different displays. Thus, the text cloud display in Fig. 6.16 shows the texts of the
text events using font sizes proportional to the text frequencies. The colours of the
texts are the same as the colours of the corresponding point symbols on the map.
This allows rapid determination of the approximate region each text comes from.
The exact spatial position can be determined by pointing at a text with the mouse
6.2 Characteristics 237

Fig. 6.17  The text cloud display shows only the texts referring to the concentration events with
the participation of minimum two different Flickr users

cursor: in response, the symbol representing the corresponding event on the map is
highlighted (the arrow in Fig. 6.15 points at the highlighted text event correspond-
ing to the mouse position in the text cloud display). Simultaneously, the temporal
reference and the frequency of the text are shown in a popup window.
The display in Fig. 6.16 shows the most frequent words and combinations for
the whole set of concentration events, which allows us to learn what objects and
public events were the most interesting to the Flickr users over all of Switzerland
and the whole time span of the data. Among them are a street parade in Zurich,
Axalp air force firing (Axalp Fliegerschiessen), Wabba Suisse (a bodybuilding
championship), Sechseläuten (a traditional spring holiday in Zürich), International
Auto Salon in Geneva, Papiliorama (a tropical garden in Kerzers), Gotthard rail-
way (Gotthardbahn), Interlaken mystery park, the zoo in Zurich, Interlaken Red
Bull air race, and many others.
In Fig. 6.17, the text cloud display shows only the texts referring to the concen-
tration events with the participation of a minimum of two different Flickr users.
It can be noted that some of the texts that were visible in Fig. 6.16 have disap-
peared in Fig. 6.17, in particular, the texts about Papiliorama and Gotthard railway.
However, the texts referring to major public events have been preserved.
By applying spatial and/or temporal filtering, we can learn what was interesting
in different parts of the territory and/or in different time intervals or different sea-
sons of the year. For example, the screenshot of the text cloud display in Fig. 6.18
corresponds to a spatial filter on the area of Basel (in this and following examples,
we show only texts referring to concentration events with the participation of at
least two different Flickr users). Interesting events are Basler Fasnacht (carnival
festivities) and Niggi Naggi (highlighted), when Santa Claus makes a motorcy-
cle ride through the city centre giving gifts to children. Besides, there were many
sport events, such as games of the European football championship in June 2008.
In particular, there was a game between the Netherlands (Holland) and Russia,
238 6  Visual Analytics Focusing on Spatial Events

which was preceded by a so-called Oranje-Party organized by football fans from


the Netherlands. Both the game and the party are reflected in the texts.
It is also possible to apply filtering by text frequencies and/or by occurrences of
particular strings. Figure 6.19 demonstrates the result of filtering by occurrence of
the string “parad”. In the text cloud display, we see the texts of the selected events.
The geographical positions of the parade events are shown on the map. The tem-
poral distribution of the respective concentration events by the years, months, and
days of the week is shown in two two-dimensional histograms. Most events are
related to the street parade in Zurich (http://en.wikipedia.org/wiki/Street_Parade).
It takes place on the second Saturday of August each year; however, the parades
of 2006 and 2012 are not reflected in our database. In Zurich, there was also a
Europride parade in June 2009. There are also events related to the Lake Parade
in Geneva (http://en.wikipedia.org/wiki/Lake_Parade). It takes place in July each
year; however, our database reflects only the parades of four years (2007, 2008,
2010, and 2011). Parada Par Tucc occurred in Italy in May 2011. All parade events
took place on Saturdays (i.e. on the sixth day of the week).

Fig. 6.18  Interesting objects and events in Basel

Fig. 6.19  Text events related to parades have been selected using a text-based filter
6.2 Characteristics 239

Besides using various filters, we can also access the textual annotation of each
individual concentration event. Thus, we find that the two longest concentra-
tion events (with the durations 9.6 and 8.38 h) correspond to the street parades
in Zurich in 2008 and 2011. These are also the events with the highest numbers
of different users (19 and 14, respectively). The third longest event with the dura-
tion 7.78 h (three different users) corresponds to the Patrouille Suisse air show that
took place in 2007 near Monthey (the Patrouille Suisse is an aerobatic team of the
Swiss Air Force). A concentration event with nearly the same duration (7.71 h) and
seven different users refers to the Auto Salon in Geneva in March 2011, where the
users were especially interested in cars from Lamborghini, Porsche, and Bugatti.
Hence, when composite spatial events are constructed from elementary spatial
events that have associated texts, the texts can be summarized for characterization
of the composite events. The text summarization is done by extracting frequent
words and combinations and counting their frequencies. The resulting lists of the
words and combinations with the frequencies can be attached to the composite
events as values of an attribute. However, such complex attribute values are not
easy to view and analyse. Therefore, we suggest generating a related set of ele-
mentary text events, where each text event corresponds to one word or combina-
tion associated with a composite event. For each composite event, as many text
events are generated as there are words and combinations obtained for this com-
posite event. The spatial and temporal positions of the text events coincide with
those of the respective composite events. For simplification, the spatial positions
of the text events may be defined as centres of the composite events. Each text
event has a single text (word or combination) and its frequency as attribute values.
The texts of the text events can be visualized in the form of text clouds, which can
be shown in a separate display, like in our examples, or put on a map, as is done
by Thom et al. (2012). Nguyen et al. (2011) suggest a number of other interesting
techniques for the visualization of texts with spatio-temporal references.
Text events can be easily filtered by means of spatial, temporal, and attribute
filters. An attribute filter can select text events based on their frequencies, word
counts, or occurrences of particular substrings. Using a filter of related object
sets (described in Sect. 4.2.4), it is also possible to select the text events related to
composite events of interest. These opportunities may help in understanding the
meanings or reasons for composite spatial events.

6.3 Relations

6.3.1 Spatio-Temporal Relations Between Events

Growth ring maps and flower diagram maps show not only characteristics of spa-
tial events but also their relations to locations in space and positions in time. This
kind of information can also be conveyed by a space–time cube (e.g. Fig. 4.14b)
and a traditional diagram map, where events are represented in an aggregated form
240 6  Visual Analytics Focusing on Spatial Events

(e.g. by rose diagrams in Fig. 5.44). Growth ring maps, flower diagram maps, and
traditional diagram maps assume that the events are spatially grouped in a rela-
tively small number of pre-defined places, the number of different places being
much smaller than the number of events. When the places are not defined in
advance, clustering techniques can be used to discover spatial clusters of events,
and then places of interest can be outlined as spatial buffers or convex hulls around
the clusters (an example is shown in Fig. 6.5).
A space–time cube can represent arbitrarily distributed events. By manipulating
the transparency level (e.g. as in Fig. 4.14b), it is possible to see the spatio-tem-
poral relations between the events: whether the events tend to cluster or they are
distributed uniformly or randomly. Clustering algorithms, described in Sect. 6.1,
also support the investigation of spatio-temporal relations between spatial events,
and are especially useful when the events are numerous and distributed over large
extent in space and/or time.
A tendency of spatial events to cluster in space and time suggests that the
events may be semantically related and/or have common causes or reasons. For
example, a cluster of low-speed events in trajectories of cars on a street may be
caused by traffic congestion. A cluster of photo-taking events of multiple Flickr
users may appear due to the presence of a spatial object or occurrence of an event
that attracted the attention of the people.
Binary spatial and temporal relations between events, such as distance, tem-
poral ordering, topological relations, or compositions of these basic relations, can
be investigated with the help of a query tool that can find, for a given reference
event or a set of reference events, all events having a specified relation to the refer-
ence event(s). For example, it can find all spatial events that occurred after a given
event within a certain spatial distance from it. The subset of events extracted by a
query can be investigated using generic visualization and computational analysis
techniques.
Besides generic spatial, temporal, and spatio-temporal relations that can occur
between any kinds of spatial events, there are kinds of relations that are specifi-
cally relevant to individual or collective movement events. In particular, the ana-
lyst may need to know how such events are related to the previous and/or future
movements of the objects.

6.3.2 Relations Between Events, Trajectories, and Context

In Sect. 5.2.3, we presented an example of investigating how movement events,


such as a roe deer encountering a lynx, affected the following movement behav-
iours of the movers (roe deer). The approach was to extract the parts of the mov-
ers’ trajectories that occurred before and after the events (within certain temporal
distances), compute suitable attributes representing properties of these parts of tra-
jectories, and compare the properties of the trajectory parts preceding the events
with the properties of the trajectory parts following the events.
6.3 Relations 241

Dividing trajectories by event occurrences and analysing different parts of the


trajectories (before, during, and after the events) may be useful not only for study-
ing event impacts on the movement but also for understanding the reasons for the
events and relating the events to the context. We shall demonstrate these possibili-
ties by using the VAST Challenge 2011 dataset of georeferenced microblog mes-
sages reflecting a disease epidemic (the dataset is introduced in Sect. 2.10.7).
To prepare the data for the analysis, we have used database processing facili-
ties to find occurrences of keywords signifying that a message tells about a dis-
ease. Examples of such keywords are “flu”, “fever”, “pain”, “ache”. A flag (further
referred to as “illness flag”) was attached to each record indicating whether the
text contains any of these keywords. Out of the 1,023,057 messages contained in
the database, 51,060 messages have been marked as related to diseases, that is,
their illness flags have been set to value 1. By looking at a histogram showing the
distribution of the disease-related events over time (Fig. 6.20), we easily find that
the epidemic began on the 18th of May, 2011. Before that, the number of disease-
related messages per day was very low but on this day, the number of such mes-
sages dramatically increased. Hence, the time frame of the epidemic is the last
three days of the time interval for which we have the data.
Our first task is to identify where the outbreak started. To achieve this, we need
to analyse the spatial distribution of the disease-related messages. However, we
should take into account that a person who got ill might write about his/her health
condition in several messages as the illness evolved. To find the origin of the dis-
ease outbreak in space, we need to take from each affected person only the first
message mentioning the symptoms of the illness.
In order to extract the primary illness-related message records, we first use a
database query to obtain a set of personal identifiers of the people who wrote at
least one message with the illness flag equal to 1 during the time of the epidemics.
The resulting set includes 22,127 identifiers. Then, we construct the trajectories of
these people from all their message records and apply the tools for segment filter-
ing and event extraction (Sect. 4.2.3) together with the general time filter. We set
the time filter to the last three days of the data period. By means of the segment

Fig. 6.20  The time histogram shows the temporal distribution of the disease-related messages in
the VAST Challenge 2011 dataset. One bar corresponds to half a day (12 h)
242 6  Visual Analytics Focusing on Spatial Events

filter, we select only those trajectory points where the illness flag equals 1 and
apply event extraction, which produces 46,538 spatial events having occurred dur-
ing the epidemic time. The event extraction tool generates, among other thematic
attributes, an attribute “ordinal number”, the value of which is the ordinal number
of an event in the trajectory from which it was extracted. As could be expected,
there are 22,127 spatial events with the ordinal number 1 signifying that the event
is the first in the trajectory from which it originates. These primary illness events
constitute 47.5 % of the total number of the extracted illness events. 12,359 tra-
jectories (48.7 %) contain two or more illness events in the last three days of their
lifetimes.
The map in Fig. 6.21 shows the spatial distribution of the primary illness
events. It is easily seen that the events are spatially clustered. The dense clusters
show us the areas where the disease outbreak started. To outline these areas, we
perform density-based clustering of the events according to their spatial positions
(parameters: distance threshold 300 m and minimal number of neighbours 20).
The results of the clustering are shown in Fig. 6.22. The three largest clusters,
red, bright green, and blue, include together 17,193 events (12,342, 2,686, and
2,165 in the red, green, and blue clusters, respectively), that is, 77.7 % of all pri-
mary events. This gives enough evidence for concluding that the disease outbreak
started in the areas covered by these three clusters. The areas can be outlined by
building convex hulls enclosing the clusters. It is puzzling, however, why there
are three distinct areas and not a single area including most of the primary illness
events. We hypothesize that these areas may be qualitatively different in terms of
the illness symptoms. To test this, we summarize the texts of the primary illness-
related messages by the areas (convex hulls) enclosing the clusters. The text sum-
marization is done as described in the previous section. We visualize the results in
a text cloud display, which shows us that the disease symptoms in the red cluster

Fig. 6.21  The primary illness events extracted from trajectories of sick individuals are repre-
sented on a map of Vastopolis by cyan-coloured circles drawn with 40 % opacity
6.3 Relations 243

Fig. 6.22  Dense spatial concentrations of primary illness events have been discovered by means
of density-based clustering according to the spatial position of the events. The different colours
correspond to different spatial clusters

Fig. 6.23  The texts of the primary illness-related messages have been summarized by the areas
enclosing the three largest clusters of the primary illness events. The most frequent words and
combinations from the red cluster and from the green and blue clusters are shown in the upper
and lower images, respectively

differ from those in the green and blue clusters. In Fig. 6.23, the most frequent
words and combinations from the red cluster and from the other two clusters are
shown separately. In the red cluster, the frequent symptoms are flu-like: chills,
fever, headache, etc. The green and blue clusters are characterized by stomach dis-
orders: diarrhoea, stomach ache, nausea, ab pain (evidently, abdominal pain), and
vomiting. The symptoms are the same in both clusters.
Hence, there are two different diseases: a flu-like disease, which started in the
city centre and on the east of it, and a stomach disease, which started on the south-
west of the city, on two sides of the river. After having discovered this, our next
244 6  Visual Analytics Focusing on Spatial Events

question is whether the diseases started at the same time. The space–time cube
in Fig. 6.24 shows that this is not so. The spherical symbols in the space–time
cube represent the primary illness-related events. The colours encode the type of
the reported symptoms: red means flu and light blue means stomach disorders. It is
clearly seen that the massive occurrences of the stomach disorder symptoms began
a day later than the massive occurrences of the flu-like symptoms.
For a more precise estimation of the timing of the two diseases, we look at a
segmented time histogram (Fig. 6.25), where the red segments correspond to the
flu-like disease and light blue to the stomach disease. The light greyish segments
correspond to the second and further messages about illness, which have been fil-
tered out. Using the histogram, we find that the number of messages about flu-like
symptoms started to increase at about 1 a.m. on the 18th of May and then a dra-
matic increase occurred at 8 a.m. The number of messages about stomach disor-
ders increased in the time interval from 2 to 3 a.m. on the 19th of May.

Fig. 6.24  The space–time cube shows the spatio-temporal distribution of the primary messages
about the flu-like disease (red) and stomach disease (light blue)

Fig. 6.25  The time histogram shows the temporal distribution of the primary messages about the
flu-like disease (red) and stomach disease (light blue). One bar corresponds to 1 h
6.3 Relations 245

Since the areas of the massive spread of the stomach disease are located at the
river, it can be concluded that this disease is waterborne. The outline of the red
cluster on the map and in the space–time cube suggests that the flu-like disease
may be airborne. The supplementary weather data tell us that the wind on the 18th
of May blew from the west. Hence, the agent causing the disease could be trans-
ported by the wind from the centre of the city to the east.
The shapes and spatial positions of the clusters of the flu events and stomach
events suggest that the two diseases might originate from a common source that
was located at the place where the river is crossed by a road represented on the
map by a thick dark red line. Evidently, there is a bridge over which the road goes.
The flu events are concentrated on the east of the bridge, following the direction
of the wind, and the stomach events on the southwest of it, following the flow of
the river. This means that some event might have happened on or near the bridge
before the 18th of May causing toxic or infectious substances to be discharged in
the air and in the river. This hypothetical event might leave traces in the microblog
messages. To find information about this event, we load from the database a subset
of the microblog posts located within a spatial window around the bridge (about 4
by 4 km) and the time interval from the midnight of May, 16, till the midnight of
May, 18. The time histogram in Fig. 6.26 shows the temporal distribution of these
messages. There is an abrupt increase of the number of messages starting from 11
o’clock on the 17th of May and ending by 22 o’clock. The corresponding bars of
the histogram are highlighted. The increase in the number of messages may indi-
cate an important happening. We summarize the texts of the messages from the
interval when the increase is observed. The text cloud in Fig. 6.27 shows the most
frequent combinations with at least two words. It becomes clear that a truck acci-
dent occurred in this place causing a fire and spilling of cargo. Evidently, the fire
produced some toxic gas that contaminated the air, and the spilled cargo contained
some toxic substance that contaminated the water.
To check whether the diseases could be transmitted not only by air or water
but also from person to person, we shall investigate separately the trajectories of
the flu-sick and stomach-sick people. Specifically, we need to check whether all or

Fig. 6.26  The time histogram shows the temporal distribution of the messages posted from the
vicinity of the bridge during two days before the outbreak. One bar corresponds to 30 min
246 6  Visual Analytics Focusing on Spatial Events

Fig. 6.27  The text cloud shows the most frequent word combinations (two words or more) from
the messages posted in the vicinity of the bridge from 11:00 to 22:00 on May, 17

almost all persons who got the disease symptoms had visited the main area of the
disease outbreak before reporting the symptoms for the first time. If a substantial
proportion of the individuals did not visit the outbreak area before getting ill, this
means that they might get infected through personal contacts with others.
We select first the trajectories of the flu-sick people, that is, trajectories contain-
ing at least one flu event. There are 17,844 such trajectories. We use a segment
filter to select the parts of the trajectories starting from 11 o’clock on May, 17 (i.e.
the approximate start time of the truck accident) and ending with the first flu event.
We want to check whether these parts of the trajectories include points located in
or close to the outbreak area of the flu-like disease, which is defined as the convex
hull of the red cluster in Fig. 6.22. For this purpose, we compute the spatial dis-
tances from the trajectory points to the outbreak area and apply an additional seg-
ment filter to select only the points and segments where the distance is from −1,
denoting that the point is inside the area, to 250 m. The map in Fig. 6.28 shows the
outbreak area (filled in red) and the selected segments of the trajectories, which
are drawn in yellow with 3 % opacity. The filter statistics report that the selected
segments belong to 15,236 different trajectories, that is, at least 85 % of the people
who posted messages about flu-like symptoms had visited the outbreak area before
that. Since the remaining 15 % could also have visited the outbreak area but with-
out leaving traces (i.e. without posting messages from this area), we can conclude
that the disease is unlikely to be transmitted from person to person.
In the same way, we analyse the trajectories of the stomach-sick people. There
are 4,615 trajectories containing stomach disorder events. We select the parts
of the trajectories starting from 11 o’clock on May, 17 and ending with the first
stomach disorder event of each trajectory and compute the point distances to the
nearest of the two areas of the stomach disease outbreak (i.e. the convex hulls
enclosing the green and blue clusters in Fig. 6.22). Among the 4,615 trajectories,
4,269 (92.5 %) have points or segments located within or close to one of the two
stomach disease outbreak areas. Hence, it is reasonable to conclude that the dis-
ease was contracted inside the outbreak areas near the river while the transmission
from person to person is rather unlikely.
6.3 Relations 247

Fig. 6.28  The map shows the segments of the trajectories of the flu-sick persons selected by
the following filter: time from 11:00 on May, 17 to the first flu event of the trajectory and spatial
position is in or near the area of the flu outbreak. The selected segments belong to 15,236 out of
17,844 trajectories (85 %)

Now, we want to compare the movement behaviours of the individuals who got
the flu-like disease and those who got the stomach disease. We extract the parts of
the trajectories starting from the first flu or stomach disorder event, respectively,
and ending with the last trajectory point, and analyse these parts as independent
trajectories. There are 3,395 trajectories of stomach-sick people containing at
least two points; the remaining 1,220 persons (26.4 % of 4,615) did not post any
message after the first message mentioning stomach disorder symptoms. We also
obtain 15,013 trajectories of flu-sick people; the remaining 2,831 persons (16 %
out of 17,844) did not post new messages after the first message mentioning flu
symptoms. It may seem that the flu-sick people were more active after getting the
disease symptoms than the stomach-sick people; however, it should be taken into
account that the outbreak of the stomach disease began a day later. Hence, it is
hardly meaningful to compare the activeness of the two groups of people in terms
of the counts of their posts or the travelled distances.
We are interested in where the trajectories of the stomach-sick and flu-sick peo-
ple ended. The ends of the trajectories are shown on the map in Fig. 6.29. The
trajectory ends of the stomach-sick people (in pink) mostly concentrate in the areas
of the stomach disease outbreak; relatively, high density is also observed in the
centre. The ends of the trajectories of the flu-sick people (in yellow) make com-
pact spatial clusters, which are very prominent on the map. Most of these clus-
ters are at the positions of the hospitals (more precisely, 13 out of 15 clusters are
at the hospitals and the remaining two clusters are at the most important public
places, Vastopolis Dome and Convention Center). We find that 5,801 trajectories
out of 15,013 (38.6 %) end at the hospitals. This may mean that the flu-like disease
was quite severe. The stomach-sick people, evidently, were able to cope with the
248 6  Visual Analytics Focusing on Spatial Events

Fig. 6.29  The spatial distribution of the ends of the trajectories of the stomach-sick people (in
pink, 25 % opacity) and the flu-sick people (in yellow, 10 % opacity)

disease on their own. By computing distances to the hospitals and applying seg-
ment filter, we find that 183 trajectories of the stomach-sick people ended at hos-
pitals. This is only 5.4 % of the total number of trajectories, that is, much fewer
than in the case of flu-sick people. Moreover, 56 persons out of these 183 also got
flu symptoms. Hence, only 127 stomach-sick people who had no flu symptoms
(3.74 %) came to the hospitals. This indicates that the stomach disease was less
severe than the flu-like disease.
In this section, we did not introduce any new analysis methods, but we demon-
strated an interesting investigation where different types of relations involving spa-
tial events played important roles. First, it was necessary to establish the temporal
ordering relations between the illness-related messages of each person for select-
ing the primary illness events, that is, the posts in which illness symptoms were
mentioned for the first time. Next, the spatial relations between the primary illness
events (specifically, their grouping into three major spatial clusters) suggested that
there might be qualitative differences between the events. This led us to the dis-
covery of two disease types. By relating the cluster outlines to the spatial context
(the river) and additional context information about the wind, we determined the
manner of disease transmission. The relative spatial positions of the disease event
clusters allowed us to hypothesize the existence of a spatial event that gave rise to
both diseases and to estimate its approximate place. Knowing the place and the
temporal relation (shortly before) to the disease outbreak start, we could extract
additional context information and find the expected event. Based on the tempo-
ral relations, we also could extract the positions of the sick people between the
event that initiated the diseases and the first disease-related message of each per-
son. By analysing the spatial relations of these positions to the main areas of the
diseases spread, we found that the people, most likely, got the diseases while visit-
ing these areas rather than by contacting other people. Finally, we analysed the
spatial relations of the ends of the trajectories of the sick people to the positions of
6.3 Relations 249

the hospitals and found that the flu-like disease was quite severe and made many
people go to hospitals.
The visual analytics tools that allowed us to discover, investigate, or exploit the
various spatial and temporal relations are visual displays (map, space–time cube,
temporal histogram), interactive filters (spatial, temporal, and filter of trajectory
segments), computation of spatial and temporal distances, and database queries
with spatial and/or temporal constraints. This set of generic tools can support the
analysis of various kinds of spatial events, possibly, in combination with trajecto-
ries of moving objects.

6.4 Recap

Spatial events are inherent in movement. Individual movement events of interest


can be extracted from trajectories of movers by means of queries, for example,
using database query tools or an interactive tool for segment filtering. From indi-
vidual movement events, collective movement events may be composed, and more
generally, composite spatial events may consist of elementary spatial events. One
type of composite spatial event is spatio-temporal cluster of smaller (but not nec-
essarily elementary) events. Such a cluster may, in particular, represent a collective
movement event, such as a traffic jam.
Spatio-temporal and spatial clusters of spatial events may be discovered by
means of density-based or graph-based clustering algorithms. Both approaches
take into account neighbourhood relations between the objects. Neighbourhood
relations between spatial events can be determined by a special distance function
that combines the distance in space with the distance in time. Furthermore, the
function can also account for distances between values of thematic attributes. In
particular, for movement events, it is often necessary to account for the movement
directions. Cyclic attributes, such as direction, require special treatment in the dis-
tance function.
Clustering algorithms usually operate in RAM, which poses a serious limit to
the number of events that can be processed. To overcome this limit, we suggest
an efficient pre-clustering procedure that can be fulfilled without loading the full
dataset in RAM. The procedure creates lists of neighbours of all events. It exploits
ordering of the events in the database, which allows loading the events in RAM by
small portions. The lists of neighbours can be used as an input of a density-based
or graph-based clustering algorithm.
Spatial and spatio-temporal clusters of spatial events can be visually repre-
sented in an aggregated way, for example, by spatial or spatio-temporal convex
hulls or buffers. Aggregated characteristics of the cluster members can be repre-
sented by colour-coding, symbols, or diagrams. However, there may be a need in a
more detailed representation, where individual events and their characteristics are
not aggregated. Growth ring maps represent clusters of events by placing pixels
representing individual events in a radial layout around cluster centres. Individual
250 6  Visual Analytics Focusing on Spatial Events

characteristics of the events can be encoded in the colours of the pixels. In particular,
the pixels can be coloured according to the absolute temporal positions of the
events or their relative positions within temporal cycles. Flower diagrams repre-
sent clusters of events by compositions of circle sectors radiating from a common
centre. The angular position of a sector represents the position of the respective
event in a temporal cycle and the length (radius) the event duration. Overlapping
of several sectors shows event density.
Spatial events may have textual characteristics, which can be aggregated to
obtain summary characteristics of composite events (such as clusters) compris-
ing these events. A text aggregate consists of frequent words and combinations
with their frequencies. Visualization and interactive exploration of text aggregates
related to space and time may be difficult. There is usually not enough space on a
map or in a space–time cube for showing text aggregates at their spatial or spatio-
temporal positions. It may be necessary to view the texts in a separate display and
apply spatial and temporal filtering to see texts from particular places and times.
To facilitate interactive exploration of text aggregates by means of spatial and
temporal filtering, we create a special kind of spatial events: text events. A text
event represents one frequent word or combination from a text summary of one
composite event. The text event has the same spatial and temporal positions as the
composite event. The word or combination and its frequency are thematic attribute
values of the text event. Text events can be easily filtered by space, time, and val-
ues of the thematic attributes. Joint filtering of the text events and the composite
events can be done using the filter of related sets.
Spatial, temporal, and spatio-temporal relations among spatial events and
between spatial events and other objects (particularly, trajectories of movers) can
be visually investigated using spatial, temporal, and spatio-temporal displays,
including the map, space–time cube, time histograms, and temporal bar chart.
Spatial and temporal distances to various objects and events can be computed and
used for interactive filtering and thereby finding instances of particular relations.
Filtering of trajectory segments based on temporal relations can be used to extract
parts of trajectories that took place before or after certain events. Spatial and tem-
poral relations between composite spatial events (such as spatio-temporal clusters)
may be indicative of underlying cause–effect relations.
Places where spatial events are clustered may be significant from the applica-
tion viewpoint and require further analysis. The Chap. 7 will deal with analysis
focusing on places.

References

Aggarwal, C. C., & Wang, H. (2010). A survey of clustering algorithms for graph data. In C. C.
Aggarwal & H. Wang (Eds.), Managing and mining graph data, Vol. 40 of Advances in data-
base systems, (pp. 275–301). Berlin: Springer.
Aigner, W., Miksch, S., Schumann, H., & Tominski, C. (2011). Visualization of time-oriented
data. Berlin: Springer.
References 251

Andrienko, G., Andrienko, N., Bak, P., Keim, D., Kisilevich, S., & Wrobel, S. (2011a). A conceptual
framework and taxonomy of techniques for analyzing movement. Journal of Visual Languages
and Computing, 22(3), 213–232.
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., Wrobel, S. (2011b). From move-
ment tracks through events to places: Extracting and characterizing significant places from
mobility data. Proceedings of IEEE Visual Analytics Science and Technology (VAST 2011)
(pp. 161–170).
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., Wrobel, S. (2013). Scalable analy-
sis of movement data for extracting and exploring significant places. IEEE Transactions on
Visualization and Computer Graphics, (Vol. 19(7), 1078–1094).
Ankerst, M., Breunig, M., Kriegel, H.-P., Sander, J. (1999). OPTICS: Ordering points to identify
the clustering structure. In Proceedings of the ACM SIGMOD 1999 (pp. 49–60).
Bak, P., Mansmann, F., Janetzko, H., & Keim, D. A. (2009). Spatiotemporal analysis of sensor
logs using growth ring maps. IEEE Transactions on Visualization and Computer Graphics
(TVCG), 15(6), 913–920.
Bak, P., Packer, E., Ship, H., Dotan, D. (2012). Algorithmic and visual analysis of spatiotem-
poral stops in movement data. In Proceedings of the 20th ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems (ACM GIS 2012), November
6–9, 2012. Redondo Beach, CA, USA.
Brasseur, L. (2005). Florence Nightingale’s visual rhetoric in the rose diagrams. Technical
Communication Quarterly, 14(2), 161–182.
Bresenham, J. (1977). A linear algorithm for incremental digital display of circular arcs.
Communications of the ACM, 20(2), 100–106.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discover-
ing clusters in large spatial databases with noise. In Proceedings of the ACM KDD 1996
(pp. 226–231).
Harrower, M., & Brewer, C. A. (2003). Colorbrewer.org: An online tool for selecting colour
schemes for maps. The Cartographic Journal, 40(1), 27–37.
Nguyen, D. Q., Tominski, C., Schumann, H., Ta, T. A. (2011). Visualizing tags with spatiotempo-
ral references. In Proceedings of the International Conference on Information Visualisation
(IV 2011), London, UK: IEEE Computer Society.
Thom, D., Bosch, H., Koch, S., Wörner, M., Ertl, T. (2012). Spatiotemporal anomaly detec-
tion through visual analysis of geolocated twitter messages. In IEEE Pacific Visualization
Symposium (PacificVis 2012) (pp. 41–48).
Chapter 7
Visual Analytics Focusing on Space

Movers
Trajectories

Locations
Movement data Local time series
Spatial events

Spatial event data Spatial time series


Times
Spatial distributions

Fig. 7.1  This chapter addresses analysis tasks focusing on movement-specific characteristics of


locations and their relations to the context. Characteristics of locations are represented by move-
ment data in the form of local time series (cf. Fig. 3.13)

Abstract This chapter considers analytical tasks focusing on space treated as a


discrete set of places of interest (Fig. 7.1). We present several methods for defin-
ing a set of places of interest based on the available movement data and depending
on the analysis goals. For visual exploration of time series (TS) associated with
places or with links between places, we suggest a time graph display enhanced
with tools for data summarization and various computational transformations.
Other analysis methods include clustering of TS by similarity, TS modelling, and
computational extraction of peaks or other features followed by representing them
as spatial events. These methods are supported by interactive visual techniques.
Binary relations between places are analysed by combining flow maps with
time graphs. We also consider dependencies between attributes of flows emerging
when the movement is constrained by channels with limited capacities, such as
in a street network. The dependencies can be represented by regression models,
which can be built, evaluated, and refined with support of interactive visual tools.
We consider also the ways to reveal and explore ordering and temporal rela-
tions involving more than two places. When places of interest are few, these rela-
tions can be revealed and investigated by means of interactive visual displays.

G. Andrienko et al., Visual Analytics of Movement, 253


DOI: 10.1007/978-3-642-37583-5_7, © Springer-Verlag Berlin Heidelberg 2013
254 7  Visual Analytics Focusing on Space

When places are more numerous, frequently occurring sequences of visited places
can be discovered by means of sequence mining algorithms after transforming
trajectories of movers into sequences of strings representing visited places. The
algorithms return frequently occurring subsequences, which can be interpreted and
explored in the spatial context after being transformed to trajectories. Sequence
mining may particularly useful in analysis of episodic movement data.

7.1 Obtaining Places of Interest from Movement Data

Geographical space, like other physical spaces, is continuous. However, for practi-
cal reasons, it is impossible to have data for each location of a continuous space
and to involve all locations in analysis. Spatial data representation and analysis
methods are usually based on treating space as a discrete set of locations. Even
representations that look like continuous or even called “continuous” (for exam-
ple, the continuous density map in Fig. 3.8) are based on representing space as a
discrete set of fine cells. The continuity of space is taken into account in analysis
by using interpolation for estimating attribute values in locations where original
data are not available.
In movement data, spatial positions of movers in a continuous space (particu-
larly, geographical) are usually specified by coordinates. The set of all positions is
a sample of locations from the continuous space. In a general case, it is not inter-
esting to deal with each individual location from this sample. First, there would
be too few data for each location, as it is visited only once or very few times. Data
for different locations would refer to different time units; therefore, the locations
would be incomparable. Second, the choice of the sample locations from the con-
tinuous space is often determined solely by the data collection method (for exam-
ple, measuring positions at regular time intervals) but not by important properties
of the locations and not by the relevance to the analysis goals).
Hence, the original sample of locations specified in movement data should be
considered as raw material, from which locations of interest need to be extracted
for further analysis. An exception may be movement data collected by location-
based method (Sect. 2.9.1). In this case, there is a pre-defined set of locations
typically chosen according to the analysis goals and/or based on set characteris-
tics. The positions of movers are recorded when they enter or come close to these
locations. Thus, the laboratory mice data (Sect. 2.10.9) were collected by devices
installed in important places over the cage. In such cases, it makes full sense to
treat the places of measurements as places of interest.
In this chapter, we shall use the term place of interest or, simply, place,
to refer to locations selected or generated from raw movement data for further
space-centred analysis. In a general case, there are three basic approaches to
obtaining a discrete set of places of interest for further analysis from a sample
of locations available in movement data: space tessellation, grouping of spatially
close locations, and event-based place extraction. All approaches involve spatial
7.1  Obtaining Places of Interest from Movement Data 255

generalization (Sect. 3.6): they generate larger spatial units from original loca-
tions (e.g. areas from points) such that all original locations fitting in the same
resulting unit are treated as being the same.

7.1.1 Space Tessellation

For space tessellation, a regular (rectangular or hexagonal) or irregular grid is used.


In Sect. 3.8, we have briefly introduced a method for dividing continuous two-
dimensional space into an irregular grid of Voronoi polygons (Okabe et al. 2000).
The method is described in detail in Sect. 5.1.1, as it is also used for spatial sum-
marization of trajectories. In regular grids, the cells are equal in sizes and shapes.
Various aggregate characteristics computed for these cells are not affected by size
differences and are therefore well comparable. This is especially important for
absolute counts and amounts, such as count of visits—it would not be valid to com-
pare counts in space compartments substantially differing in sizes. A disadvantage
of regular grids is that they do not respect the spatial distribution of the original
data, in particular, natural spatial clusters. Our method for irregular space tessella-
tion strives at producing cells of approximately equal sizes while adapting them to
the spatial distribution of the data, as can be seen, for example, in Fig. 7.2 top.
There may be such kinds of analysis when it is convenient to deal with cells
that may differ in areas but be close in other characteristics, such as the number of
data items (e.g. trajectory points or events) contained in the cells. A well-known
method of regular division depending on characteristics is quadtree (Laurini and
Thompson 1992). The basic principle is to subdivide a rectangular cell into four
quadrants when the cell includes more than a given number of data items or its
internal variation in some characteristics exceeds a given threshold. A similar prin-
ciple can also be applied in the irregular division. To remind, our space tessellation
method is based on finding groups of points that can be enclosed in a circle with
a given radius, which determines the approximate sizes of the cells obtained at the
end. It can be modified as follows. The user specifies two radii, maximal Rmax and
minimal Rmin, and a minimal number of points Nmin when a group can be subdi-
vided. The points are first organized into groups with the maximal radius Rmax.
Then, each group that includes more than Nmin points and has a radius not less
than doubled Rmin is subdivided into two or more groups with the radii two times
smaller than the current group radius. The subdivision of a group is done by means
of the same algorithm as the initial division of the whole set of points. This opera-
tion is iteratively applied until there are no groups to be subdivided.
Figure 7.2 illustrates the division into cells of variable sizes (bottom) in com-
parison with the base variant of division into cells of approximately equal sizes
(top). The tessellation is done based on the positions of characteristic points from
the trajectories of Flickr users in Switzerland and surrounding areas. In the upper
image, the base algorithm is applied using the group radius of 50 km. A disad-
vantage of the resulting division is that intensively visited places, such as major
256 7  Visual Analytics Focusing on Space

Fig. 7.2  A territory is divided into irregular cells (Voronoi polygons) with approximately equal
sizes (top) and with variable sizes depending on the data density (bottom). The positions of the
available data are represented by circles in pink drawn with 3 % opacity. The circles with yellow
filling and black boundary are the generating seeds for the Voronoi polygons

touristic cities, are covered by very large cells including also surrounding terri-
tories, which are not so intensively visited. When this division is used for data
aggregation and following analysis, too much information about frequently vis-
ited places will be “dissolved” by summation and averaging over large areas. At
the same time, having small cells everywhere is not beneficial because many cells
may have too few data.
The tessellation with variable cell sizes allows having finer detail in data-dense
areas without increasing the number of uninteresting cells with few data. This var-
iant of tessellation is demonstrated in the lower image in Fig. 7.2. The tessellation
7.1  Obtaining Places of Interest from Movement Data 257

has been produced using the following parameters: maximal radius Rmax = 50 km,
minimal radius Rmin  = 15 km, and minimal number of points for subdividing a
group Nmin = 10,000. Note that, independently of the cell sizes, the method tends
to put the cell centroids (i.e. the generating seeds for the Voronoi polygons) in or
close to dense concentrations of points.
A good property of space tessellation as a means of obtaining a discrete set of
places is that the resulting places fully cover the underlying continuous space, that
is, none of the existing spatial locations is lost.

7.1.2 Grouping of Close Locations

Full coverage of the underlying territory may be not important and even not
desired in cases when the original data occur not everywhere in space but only in
some parts. For example, Fig. 7.3 shows the spatial distribution of the positions of
the roe deer from the example dataset introduced in Sect. 2.10.8. There are several
big spatial clusters of points, and the remaining points are scattered around them.
A large part of the territory does not contain any points. In this case, obtaining

Fig. 7.3  a A dataset where points occur only in some parts of the territory. b Polygons obtained
by tessellation of the territory (dark green lines) are compared with polygons enclosing groups of
spatially close points
258 7  Visual Analytics Focusing on Space

a discrete set of places by means of tessellation produces many empty cells and,
even worse, large cells corresponding to one or a few points from the original data
(Fig. 7.3b, green lines). The use of this set of cells for analysis may be misleading.
It is more reasonable to generate such places that cover only the parts of the space
where the data really occur but not the whole space. For example, the set of poly-
gons with thick blue boundaries in Fig. 7.3 could be an appropriate spatial gen-
eralization of the original locations of the roe deer dataset. These polygons have
been built as convex hulls around groups of spatially close points. The points that
are far away from others have got their individual hulls.
To generate places of interest in this way, we use the same point grouping
algorithm as for the tessellation. However, the steps of extracting group centres
and tessellating the territory with these centres as generating seeds are omitted.
Instead, after obtaining point groups, we build convex hulls around them.
Point grouping according to their spatial coordinates can also be done by
means of partition-based clustering methods, such as k-means; however, com-
parison of our grouping method with k-means shows that our method produces
more spatially compact point clusters (Andrienko and Andrienko 2011). Density-
based clustering methods could be suitable when an analyst is interested only in
places where points are densely concentrated and is willing to disregard areas with
sparsely scattered points. Density-based clustering methods can produce clusters
with non-convex shapes. When such clusters are enclosed by convex hulls, some
of these hulls may intersect and overlap. Therefore, it may be desirable to subdi-
vide density-based clusters with non-convex shapes into convex subclusters. An
algorithm for doing this is described by Andrienko et al. (2009).

7.1.3 Event-Based Place Extraction

Event-based place extraction from movement or event data is used when the analyst
is specifically interested in places where particular spatial events repeatedly or ever
occurred. These may be individual movement events, such as stop, slow movement,
or sharp turn, or collective movement events, such as encounters of movers, spatio-
temporal concentrations of movers, or traffic congestions. Obviously, before extract-
ing places, it is necessary to extract and/or construct relevant spatial events. The
methods for doing this were discussed in the previous chapter.
After obtaining a set of relevant spatial events, their spatial positions are used
for generating relevant places. Depending on the analyst’s interests, this can be
done in two ways. If the analyst is interested in all places where at least one event
ever occurred, the method based on point grouping, which is described in the
previous section, can be applied. If the analyst is interested only in places where
events occurred repeatedly, density-based spatial clustering is appropriate. An
example is shown in Fig. 7.4. First, stop and turn events have been extracted from
the personal driving trajectories introduced in Chap. 1. Second, dense spatial clus-
ters of these events have been obtained using density-based clustering according to
7.1  Obtaining Places of Interest from Movement Data 259

Fig. 7.4  Relevant places are defined as polygons enclosing dense spatial clusters of turn and
stop events extracted from trajectories. Image b enlarges the north-eastern part of the territory
visible in image a

spatial distances between the events. Third, the noise has been excluded, to have
only places of repeated event occurrences. Fourth, convex hulls have been built
around the clusters, giving the set of places of interest for further analysis.
Density-based clustering with the special distance function 6.3 (Sect. 6.1.1)
should also be used when it is necessary to separate places based on thematic
characteristics of the events occurring in them, such as movement directions. This
way of obtaining relevant places is described by Andrienko et al. (2011, 2013).
An example is given in Sect. 6.1.4 and Fig. 6.5: places of traffic congestions have
been extracted in such a way that each place corresponds to a particular move-
ment direction. It should be noted that places extracted in this way may overlap
in space. In aggregating data by such places, it is necessary to take for each place
only relevant data, that is, with thematic characteristics consistent with the the-
matic characteristics of the events used for place extraction.

7.1.4 Extraction of Personal Places

In analysis of movement behaviours of people, it may be necessary to extract sig-


nificant personal places of these people, such as places of home, work, shopping,
sports. In the introductory chapter, we have demonstrated how such places can be
extracted from trajectories of a single person. We used event-based place extrac-
tion: we first extracted person’s stop events and then applied density-based cluster-
ing to find places of repeated stops.
Basically, the same approach can be applied to trajectories of multiple persons.
First, it is necessary to prepare an appropriate set of spatial events. In the case of
260 7  Visual Analytics Focusing on Space

quasi-continuous movement data, such as GPS tracks, stop events are extracted
from the trajectories as described in Sect. 3.5. In the case of episodic movement
data, it may be reasonable to use all recorded positions. This refers, for example,
to trajectories of mobile phone users, Flickr users, and Twitter users composed of
call events, photograph-taking events, and message-posting events, respectively.
Extraction of stops from such data is very problematic, if not impossible.
The spatial events to be used for personal place extraction need to have refer-
ences to the individuals from whose trajectories the events have been taken, that is,
the events must have a thematic attribute the values of which are the identifiers of
these individuals. We shall refer to this attribute as “mover’s identifier”.
Second, the events need to be clustered for finding places of repeated event occur-
rences. This is done by means of density-based clustering with the distance function
6.3 (Sect. 6.1.1). It is essential to separate the events of different individuals. For this
purpose, the attribute “mover’s identifier” is used in the distance function along with
the spatial positions of the events. This attribute is qualitative; hence, two events will
be treated as neighbours only if they have exactly the same value of this attribute.
Therefore, each of the resulting clusters will include only events of one person.
After obtaining the clusters, the personal places are defined as convex hulls or
spatial buffers enclosing the clusters. The places must “inherit” the thematic attrib-
ute “mover’s identifier” from the events included in the clusters, that is, the places
must have references to the individuals they belong to.
A very important task in the analysis of personal places is to give semantic
interpretation to the places, that is, to classify them as home, work, shopping, etc.
In the example with one person, interactive visualizations helped us to understand
the meaning of the different places. In particular, we looked at the temporal dis-
tributions of the stops in each place by the days of the week and hours of the day.
However, this cannot be done for each of the personal places extracted from trajec-
tories of multiple individuals.
To scale the analysis up, spatio-temporal aggregation is used. The move-
ment data are aggregated by the extracted personal places and time intervals.
Trajectories of each individual are aggregated only by the places of this individual.
The temporal aggregation is done in several ways:
• by days over the whole time period, which gives the counts of place visits for
each day;
• by positions in the daily cycle, for example, by hourly intervals of the day,
which is done separately for three subsets of the data:

– parts of the trajectories that occurred on the work days, that is, from Monday
to Friday;
– parts of the trajectories that occurred on Saturdays;
– parts of the trajectories that occurred on Sundays.

As a result, each place is characterized by four time series (TS): TS of visit


counts by days and TS of visit counts by times of the day for the work days,
7.1  Obtaining Places of Interest from Movement Data 261

Saturdays, and Sundays. Later in this section, we shall explain how these TS can
be used for interpretation of personal places.

7.2 Characteristics

Characteristics of places are obtained by means of discrete spatial or spatio-tem-


poral aggregation of movement or event data, as described in Sect. 3.8. One of the
results of the aggregation is a set of local TS referring to the places.

7.2.1 Visualization of Time Series

Andrienko and Andrienko (2010) discuss various methods of visual representa-


tion of aggregated movement data on static maps with diagrams, animated maps,
and time graphs. Several examples of such representations occur in this book
(Figs. 1.22, 1.24, 3.9c, and 5.44). Figure 4.4 demonstrates representation of local
TS in a space–time cube. Figures 5.34 and 5.35 show results of data aggregation
in a transformed space: absolute positions of movers were transformed into rela-
tive positions in an abstract space, this space was partitioned into cells, and the
transformed data were aggregated by these cells. Other possible methods for visu-
alization of spatial TS can be found in the book by Aigner et al. (2011).
Representation of local TS in spatial displays, such as maps and space–time
cubes, offers only very limited opportunities for temporal analysis. Therefore, it is
necessary to combine spatial displays with temporal displays, such as time graphs,
and computational methods for TS analysis.
Time graph, also known as line chart, is, perhaps, the best known and most
popular type of plot used for representing time-variant numeric data. The data are
represented as polygonal lines, where line segments connect points representing
attribute values at different time units. An example is shown in Fig. 7.5a. In this
example, we have summarized the trajectories of the Flickr users in Switzerland
by the places resulting from the territory tessellation shown at the bottom of
Fig. 7.2 and by monthly time intervals. One hundred and forty-six TS of monthly
counts of visitors in the places are shown by grey lines. The thick black line con-
nects the consecutive positions of the monthly mean values and, hence, allows
comparing the values in any TS with the mean values. A TS can be selected by
clicking on its line in the time graph or on the place it refers to in a map display.
Selected TS are highlighted (shown in black) in the time graph, as demonstrated in
Fig. 7.5; the corresponding places are highlighted in the map.
It should be admitted that the time graph is hardly readable due to overplotting.
Only the highlighted line and a few lines with the highest values can be seen rela-
tively clearly. The lines with the highest values correspond to places in the central
262 7  Visual Analytics Focusing on Space

Fig. 7.5  a Time series are represented in a time graph (line chart) by polygonal lines. b The data
are shown in a summarized form. The polygons shaded in alternating light and dark grey repre-
sent the deciles (i.e. 0, 10, 20,…, 90, and 100th percentiles) in each time step. In (a) and (b), the
thick black line connects the mean values for consecutive time intervals and the thin black line
corresponds to a selected TS. c The data are summarized by value intervals (classes) and repre-
sented in the form of segmented time histogram

parts of Zurich, Basel, and Geneva. The highlighted line corresponds to the north-
ern part of Geneva. The remaining lines are not discernible.
To obtain an overall view of the temporal variation, two types of summarized
representations can be used (Andrienko and Andrienko 2006): by quantiles and by
value intervals, or classes. The first type is illustrated in Fig. 7.5b; it can be called
quantile graph. The polygons shaded in alternating light and dark grey represent
the deciles (i.e. 0, 10, 20,…, 90, and 100th percentiles) in each time step. The pol-
ygons boundaries are obtained by connecting the positions of the corresponding
deciles in consecutive time steps. The boundary lines themselves are not drawn,
to reduce the display clutter. On top of the shaded polygons, the mean line and the
line of one selected TS are shown as in Fig. 7.5a.
Figure 7.5c illustrates summarization by value classes. The value range is interac-
tively divided into intervals, or classes. For each time interval and each value interval,
the number of fitting data values is counted. The resulting counts are represented in
7.2 Characteristics 263

the form of temporal histogram. Each bar corresponds to one time interval (month in
our example) and is divided into segments proportionally to the value counts in the
value classes. The segments are painted in colours assigned to the value classes. In our
example, we use one of the Colour Brewer colour scales (Harrower and Brewer 2003).
Both summarized views show us a periodic character of the overall temporal
variation: the number of visitors in many places increases in the summer months,
especially in August, and decreases in late autumn and winter months. However,
this does not equally refer to all places. For example, the highlighted TS line has
periodic increases in March.

7.2.2 Transformations of Time Series

Besides the periodic variation, we see in Fig. 7.5 an increasing trend from the
beginning of the time period of the data till the summer of the year 2010, and then,
the numbers of visitors decrease. The increasing trend at the beginning can be
explained by a gradual growth of the popularity of Flickr and the number of Flickr
users since the beginning of 2005, when the numbers of Flickr users and published
photographs were rather low. The decreasing trend at the end can be explained by
the delays between taking photographs and publishing them. Besides, there may
be a decrease in the number of Flickr users caused by the appearance of many
other photograph-sharing web sites.
In TS analysis, it is often desirable to disregard the trend, in particular, when
it is necessary to focus on periodic changes or on detecting extreme rises and/or
drops. In particular, in our example, the trend is irrelevant to the analysis goals
since it reflects the specific dynamics of Flickr use, while our interest is the
dynamics of the place visits. We suggest an interactive procedure for removing
trends from TS using a trend reconstruction tool in-built in the time graph display.
The easiest way to model a trend of a TS is to represent it as a straight line
described by the equation y  = A · t  + B, where A and B are constant numbers,
variable t represents the temporal component of the data, and variable y represents
the attribute value. This kind of trend is called linear trend. Let v(ti) be the actual
attribute value corresponding to time step ti, where 1 ≤ i ≤ N, N being the length
of the TS, that is, the number of time steps. For a given TS, the values of A and B
are computed as follows: A = (v(tN) − v(t1))/N; B = v(t1).
For multiple TS represented in a time graph, a common trend line can be built
based on the TS of mean or median values. However, a straight line may general-
ize the temporal variation too much. Thus, in our example, the overall linear trend
is completely flat and represents neither increase nor decrease. The trend recon-
struction tool allows the user to divide the time period into arbitrary intervals by
mouse-clicking on positions below the plot area corresponding to different time
steps. The trend reconstruction tool then computes a linear trend individually for
each interval. The overall trend is composed from the linear pieces corresponding
to the different intervals. By selecting and deselecting time steps, the user strives
264 7  Visual Analytics Focusing on Space

Fig. 7.6  a A piecewise linear trend line is built by interactively dividing the time span into inter-
vals. b The TS have been transformed by removing the individual trends. c, d The transformed
TS are summarized by deciles (c) and by value intervals (d)

at achieving the best match between the resulting piecewise trend line and the
trend perceived from the mean or median line (Fig. 7.6a).
After reconstructing the overall trend line, it is possible to transform the data
by removing the trends. For each TS, the tool reconstructs its individual trend
line consisting of linear pieces corresponding to the user-specified time intervals.
The trend is removed from the data in the following way. Let ymax be the maximal
trend value for a given TS {v(ti) | 1 ≤ i ≤ N} and y(ti) be the trend value corre-
sponding to the time step ti. Then, each value v(ti) is replaced by the value v′(ti) = 
v(ti) + (ymax − y(ti)). Figure 7.6b presents the result of removing the trends from
7.2 Characteristics 265

our example set of TS, and Figs. 7.6c and d represent the transformed TS in two
summarized forms: by deciles (quantile graph) and by value intervals (temporal
histogram).
The de-trended TS tell us how many Flickr users would have visited the places
of interest if the use of Flickr was uniform over time. The periodic character of the
overall temporal variation is better visible in the summarized representations of
the transformed TS (Fig. 7.6c, d), although the increases in the years 2005–2006
and 2011–2012 are not so high as in the middle of the time period. In Fig. 7.6d,
it is possible to notice that besides the summer rises, there are also winter rises,
although not so high as in summer. The lowest numbers of visitors are in October
and November.
There are many other useful transformations that can be applied to numeric TS.
A number of transformations are described by Andrienko and Andrienko (2006),
among them temporal smoothing, comparison to a reference time step, object, or
value, normalization, scale transformations, and others. Figure 7.7 shows an exam-
ple of transforming values in TS to standardized deviations from the local mean

Fig. 7.7  a The values in the TS have been transformed into normalized deviations from the local
mean values. b, c The transformed TS are summarized in two ways
266 7  Visual Analytics Focusing on Space

values. The transformation is done as follows. For each TS, its own mean value μ
and standard deviation σ are computed. Then, each value v(ti) is replaced by the
value v′(ti) = (v(ti) − μ)/σ. In Fig. 7.7, this transformation has been applied to the
TS resulting from another transformation, specifically, to the de-trended TS of the
counts of place visitors.
The transformed data show when the numbers of the visitors were higher than
the mean (the transformed values are positive) and lower than the mean (the trans-
formed values are negative) and how much higher or lower. The results of the
transformation are the best to view on the temporal histogram (Fig. 7.7c), where
the negative values are represented by shades of blue and positive by shades of
red. The darker the shade is, the higher is the deviation from the mean. Light yel-
low corresponds to values close to mean (±0.25σ). The decile graph (Fig. 7.7c)
is also quite informative. In particular, it shows us that in August, 70–90 % of
the places have more visitors than on average, and in November, 80–90 % of the
places have fewer visitors than on average.

7.2.3 Clustering of Time Series

Summarized representations of multiple TS allow us to see the general character


of the overall temporal variation, but, as we noted earlier, the variation character
may be not the same in all places. It would be daunting to investigate each TS
individually and compare it with all others. A common approach to investigat-
ing characteristics of multiple objects is to group (cluster) them by similarity of
their characteristics and then consider summarized characteristics of the groups.
We have already used clustering for investigating characteristics of trajectories
(Sect. 5.1.2). Clustering can also be applied to TS of numeric values. In the case
of TS referring to places, clustering is expected to group together places with
similar temporal variations in attribute values.
A number of specific distance functions for assessing similarity between
numeric TS have been developed in the area of statistics and machine learning;
Ding et al. (2008) describe and compare nine of them. Some distance functions may
translate, stretch, and/or shrink the TS for finding similar segments. Ding et al.
call them “elastic distance measures”. These functions are not suitable for group-
ing of places based on their local TS because they distort the actual variation in
attribute values and do not respect temporal cycles. Only the distance functions
that compare the value for each time step ti in one TS with the value for the same
time step ti in the other TS are suitable for the comparison of local TS in different
places. Ding et al. call such functions “lock-step measures”. The most straight-
forward similarity measure that can be used for TS is the Euclidean distance or
its variants, for example, Manhattan distance. Ding et al. note that this measure
is “surprisingly competitive with other more complex approaches”, especially for
large number of TS.
7.2 Characteristics 267

Since the popular partition-based clustering algorithm k-means employs the


Euclidean distance as the similarity measure, it can be used for clustering of
numeric TS. Other generic clustering methods can also be used; thus, Schreck
et al. (2009) and Andrienko et al. (2010) apply self-organizing map (SOM), also
known as Kohonen map. Here, we give an example of using the k-means method.
We shall apply it to our TS of the monthly place visitor counts in Switzerland.
We run the k-means method with different values of k (number of clusters)
to find the most suitable grouping. The results of the clustering are immediately
shown on the time graph and the map display by painting lines in the graph and
areas in the map in different colours assigned to the clusters (Fig. 7.8b, c).
A good choice of colours for the clusters can facilitate understanding of clus-
tering results. Suitable colours can be chosen automatically based on the rela-
tive distances between the cluster centres in the space of attribute values, so

Fig. 7.8  K-means clustering is applied to TS. a Cluster centres are projected onto a two-dimen-
sional colour space for assigning colours to the clusters. b The TS lines on a time graph are
coloured in the colours of the clusters. c The places in the map are coloured according to their
cluster membership
268 7  Visual Analytics Focusing on Space

that similarity of cluster colours means closeness of the cluster centres. For this
purpose, the cluster centres are projected on a two-dimensional colour space
(Andrienko et al. 2010). To allow the user to judge the distances between the clus-
ters in the attribute space and control the assignment of colours to the clusters, a
display of the colour space with the projected positions of the cluster centres is
created (Fig. 7.8a).
The projection display and time graph allow us to examine the changes in the
clustering results occurring when the value of the parameter k is increased or
decreased. Thus, in Fig. 7.8a, there is a dense concentration of cluster centres in
the upper part of the projection display (blue area). The respective clusters consist
of TS with low amplitudes of value variation. We observe that increasing the value
of the parameter k results in further refinement of these clusters and appearance of
new cluster centres in the dense area, since the differences between the clusters are
small. It is therefore not meaningful to use large values of k.
After testing the impact of the clustering parameter and choosing the suitable
value, we review the resulting clusters one by one to judge their internal homo-
geneity. If some cluster has high internal variability, it should be subdivided into
smaller clusters. We do this by means of progressive clustering, that is, applying
the clustering tool to members of one or a few chosen clusters (this idea was intro-
duced in Sect. 5.1.2.3 in application to trajectories). Thus, the area on the north of
Geneva has a peculiar TS with high peaks occurring in March of different years
(this TS is highlighted in Figs. 7.5, 7.6, 7.7). The k-means algorithm groups it
together with several other TS, which do not have peaks in March. The internal
variation in this group is high. Therefore, we select this group by means of filter-
ing and apply the clustering algorithm to the group members. With k = 2, the divi-
sion is not good enough, but with k = 3, the area with the peculiar TS is separated
from the others, that is, we obtain a cluster containing only this TS. This is cluster
12; its centre is on the right of the projection display in Fig. 7.8a, in the violet
area. The cluster is quite far from the other clusters in the projection space, which
means that it is, indeed, different.
After achieving satisfactory clustering results with regard to the internal varia-
tion in the clusters, we can now investigate the specifics of the temporal variation
in each cluster and compare those between the clusters. For intercluster compari-
sons, it is convenient to look at the TS of the cluster mean or median values in a

Fig. 7.9  The time graphs show the time series of the mean (left) and median (right) values by
the k-means clusters
7.2 Characteristics 269

time graph (Fig. 7.9), where it is possible to switch off the lines of some clusters
so that the variations in remaining clusters could be compared more easily.
The patterns of temporal variation can not only be explored visually but also
represented in an explicit form suitable for communication and utilization in fur-
ther analyses. This can be done using methods for TS modelling. The use of these
methods can be supported by interactive visual interfaces.

7.2.4 Time Series Modelling

TS analysis and modelling is a well-established area in statistics. The existing


variety of methods and tools can be applied to spatial TS. However, analysing
and modelling each spatial TS independently from others ignores relationships
and similarities that may exist among the places or objects described by the TS.
Clustering and interactive grouping allows related places or spatial objects to be
analysed together.
Hence, we suggest the following way to employ TS modelling techniques for
modelling of spatio-temporal phenomena represented by place-related numeric
TS. First, the set of TS is divided into groups based on similarity of the tempo-
ral variations in the attribute values. This is supported by tools for clustering and
interactive regrouping. The results are controlled using the time graph display and
the map display, as described in the previous section. Second, the analyst visu-
ally inspects the temporal variation in the attribute values in each group of TS by
means of a time graph and thereby gains an understanding of the character of the

Fig. 7.10  For time series modelling, the analyst creates a representative TS for a cluster (thick
blue line), sets the time interval from which to take the input values for the model (the vertical
green and red lines mark the beginning and the end of the interval, respectively), and selects a
suitable modelling method
270 7  Visual Analytics Focusing on Space

temporal variation. Then, the analyst externalizes this understanding by building a


curve that expresses the generic characteristics of the temporal variation within the
group. The equation describing this curve is a formal model representing the gen-
eral character of the temporal variation in the group.
The process of curve building and model evaluation can be supported by inter-
active visual tools, as described in detail by Andrienko and Andrienko (2013).
A possible appearance of such tools is shown in Fig. 7.10; the TS are from one of the
clusters (cluster 2) represented in Fig. 7.8. The tools provide an interface to a library
of modelling methods, from which the analyst selects a suitable modelling method.
The sequence of input values for the modelling method is created, according to the
user’s choice, from the median or mean values taken from all time steps, or from
arbitrary percentiles (e.g. 60th percentiles). When choosing to use the mean values,
the analyst may decide to exclude a certain percentage of the highest and/or lowest
values in each time step. This diminishes the impact of outliers on the model input
values. The generated input sequence, further called representative TS, as it repre-
sents the temporal variation in the group, is shown on the time graph (the thick blue
line in Fig. 7.10).
The analyst can limit the temporal extent of the representative TS, that is, use
only a fragment of the TS as an input to the modelling method. This is demon-
strated in Fig. 7.10: the vertical green and red lines mark the beginning and the
end of the chosen subinterval. This is meaningful, in particular, for the TS gener-
ated from the Flickr data, where the values at the beginning and at the end of the
time period are too low due to the specifics of Flickr use.
Periodic variation in attribute values, as in our example TS related to places
in Switzerland, can be modelled by means of triple exponential smoothing, or
Holt–Winters method. For this method, it is necessary to specify the length of
the period, that is, the number of time steps in one period (cycle). Andrienko and
Andrienko (2013) suggest a way to use this method also for modelling of data var-
iation with respect to two temporal cycles, such as daily and weekly.
The representative TS or its fragment from the user-specified time interval
is passed to the selected modelling method for building a model. If the method
requires specific information (such as period length for triple exponential smooth-
ing), it is also sent to the method. The method tries to find the best fitting model
(according to statistical criteria such as minimal mean-squared error) by varying
the model parameters. After the model is created, the sequence of model-predicted
values for the same time steps as in the original data plus several further time steps
is obtained and shown on the time graph so that the predicted values can be com-
pared with the representative TS and with the individual TS. The automatically
selected parameters of the model are shown to the analyst.
It is not guaranteed that the automatic selection gives the best possible
result. First, the model building algorithm may be trapped in a local optimum.
Second, not only the statistical criteria of fitness are important. In particular,
the model needs to represent the variation in a group of TS rather than sin-
gle TS and hence should have a sufficient degree of generality, which is not
achieved by the automatic parameter selection. Therefore, the analyst is given
7.2 Characteristics 271

Fig. 7.11  A model built with automatically selected parameter values (a) has been amended by
modifying the parameter values (b) and further improved by changing the time range of the input
time series (c)

the possibility to iteratively modify the parameters of the model, rerun the mod-
elling method, and immediately see the result as a curve on the time graph.
The analyst may also modify the input TS or the time interval from which the
input values are taken. When the curve corresponds well to the analyst’s idea of
the general characteristics of the temporal variation in the group, the model is
stored for further use.
An example of interactive model improvement is demonstrated in Fig. 7.11.
The initially built model with automatically selected parameter values was too bad
(a). We managed to improve it by modifying the parameter values, but still the
fit was not good enough (b). In particular, the small peaks occurring in Januaries
were shifted to the previous months in the model. This undesirable feature was
removed after we had changed the input time interval for the model (c).
To evaluate the quality of the model, the analyst examines the model residuals,
that is, the differences between the actual values and the model-predicted values.
The temporal distribution of the residuals is explored by means of the time graph
display and the spatial distribution by means of the map display. The absence of
clear temporal and spatial patterns in the distribution of the model residuals (in
other words, the distributions appearing as random noise) signifies that the model
272 7  Visual Analytics Focusing on Space

Fig. 7.12  Images b and d represent the residuals of the models shown in images a and c, respec-
tively. The model shown in a has been modified as shown in c to correct the bias towards overes-
timating the values

captures well the general features of the spatio-temporal variation. If this is not so,
the model needs to be modified or the group subdivided.
An example of model evaluation and modification is shown in Fig. 7.12. Note
that the representation of the individual TS has been replaced by a summarized
representation by quartiles, that is, each boundary between a dark grey polygon
and a light grey polygon is built from the positions of one of the quartiles. Image
A shows the model we have earlier built for cluster 2 (the same as in Fig. 7.11c).
Image b shows the residuals of this model, also in a summarized form. It is notice-
able that parts of the polygon enclosing the values between the second and third
quartiles often fall below the zero line. This means that negative values occur
more frequently among the residuals than positive values, that is, the model has
a tendency to predict higher values than the actual values. To correct this bias,
we change the way in which the representative TS is built. Originally, it was built
from the mean values after excluding 5 % of the highest values and 5 % of the
lowest values. When we exclude the upper 10 %, we obtain the model shown in
image c; the respective residuals are shown in image d. We see that the relative
positions of the quartiles with respect to the zero line have improved (we do not
take into account the last time interval of the data, where the values significantly
decrease reflecting the specifics of using Flickr).
After the models are built and stored, they can be used for prediction, for
example, for estimating how Flickr can be expected in each place in a given
month. According to our framework, one TS model is built for a group (cluster)
of similar TS associated with different places. Being applied straightforwardly,
the model would predict the same values for all places/objects of the group. The
statistical distribution of the predicted values would differ from the distribution
of the original values. To avoid these undesired effects, the model-based predic-
tion is individually adjusted for each object based on the basic statistics (quar-
tiles) of the distribution of its original TS values. The statistics are computed
on the stage of model building and stored in the model description file together
with the information about the group membership of the places or objects.
7.2 Characteristics 273

Besides that, the statistics of the model-predicted values for the same time steps
as in the original data is computed and stored together with the description of
each model.
The adjustment of the predicted values is done in the following way. Let Q1i,
Mi, and Q3i be the first quartile, median, and third quartile, respectively, of the
value distribution in the original TS for the object/place i. Let Q1, M, and Q3 be
the first quartile, median, and third quartile, respectively, of the distribution of the
model-predicted values for the group containing the object/place i. We introduce
level shift S and two amplitude scale factors Flow and Fhigh as
Mi − Q1i Q3i − Mi
S = Mi − M; Flow = ; Fhigh = .
M − Q1 Q3 − M
Let vt be the model-predicted value for an arbitrary time step t (this value is
common for all group members). The individual value vit for the object/place i and
time step t is computed according to the formula:

M + Flow · vt − M + S,if vt < M


  
t
vi =
M + Fhigh · vt − M + S,otherwise

The adjustment according to this formula preserves the quartiles of the original
value distribution for each object/place. In the model evaluation step, model resid-
uals are computed as the differences between the original values and the individu-
ally adjusted predicted values.

Fig. 7.13  Previously build models have been used to predict time series values without adding
noise (a) and with adding Gaussian noise (b)
274 7  Visual Analytics Focusing on Space

In our example with place-related TS from Switzerland, we have built a model


for each of the 13 clusters obtained in our previous analysis (Fig. 7.8). Then, we
have used the models to predict the monthly counts of visitors in all places for
the time period starting from January 2010 and ending with December 2012. Two
variants of prediction are shown in Fig. 7.13: without adding noise (a) and with
adding Gaussian noise (b). The lines in the time graphs have the colours of the
clusters (the same as in Fig. 7.8). Note that different values are predicted for dif-
ferent members of the same cluster even when no noise is added.
In this section, we have described an approach to modelling spatio-temporal phe-
nomena represented by place-related TS, such as TS of presence of movers. Spatio-
temporal variation is modelled through grouping places by similarity of their TS and
applying existing TS modelling methods for representing general patterns of tem-
poral variation in each group. Since grouping divides the territory into regions with
crisp boundaries, the approach is particularly suitable for modelling spatially abrupt
phenomena, that is, such that neighbouring places or objects can substantially differ
by their characteristics. This character of spatial distribution is quite usual for peo-
ple, vehicles, or animals, whose presence may change abruptly from place to place.
From the visual analytics perspective, our approach is a way to represent results of
interactive visual analysis in an explicit form, which can be not only reviewed and
communicated but also used in further analysis and for making predictions.

7.2.5 Event Extraction from Time Series

Spatial TS representing dynamics of presence of moving objects may contain


peaks signifying that in some time units, there were more objects in a place
than before and after that. These changes in presence may be related to essen-
tial features of the behaviours of the movers and/or to events and phenomena
occurring in the spatio-temporal context. Hence, it may be important for com-
prehensive movement analysis to detect and investigate peaks in object pres-
ence in places of interest. When the places of interest are few, peaks can be
detected by visual inspection of the TS represented on a time graph. When the
places are numerous, it is reasonable to apply computational methods of peak
detection.
A number of algorithms for detection of various features in TS have been
developed in the fields of statistics and data mining. We use a modified variant
of the peak detection method suggested by Billauer (2010). The algorithm iden-
tifies abrupt peaks (increase followed by decrease) or pits (decrease followed by
increase) within the given time window. It has two parameters: minimal amplitude
δ and maximal width of a time window w. The algorithm will identify a sample
x[n] of the TS as a peak if it is a local maximum in the interval wn: = [n − w/2,
n + w/2], and there are samples of value less than or equal to x[n] − δ both before
7.2 Characteristics 275

and after x[n] within wn. A sample is a pit if it is a local minimum in wn, and there
are samples of value at least x[n] + δ around it. The algorithm outputs the ampli-
tude of the peak/pit and the sample number n:

APOS [n] := x[n] − min (x[k]) and ANEG [n] := max(x[k]) − x[n]
k∈wn k∈wn

The pseudo-code given below includes only the part of the algorithm that
detects peaks.

The algorithm needs only one pass through the TS. However, in order to deter-
mine whether a local maximum max X is a peak, the algorithm goes through all
samples in the window (in lines 8–9) to verify that the definition given above
holds. Therefore, the algorithm complexity is O(w|x|).
An extended version of the algorithm allows detection of peaks or pits from
normalized values. The values within each TS are transformed to z-scores, that
is, the deviations from the mean of the TS divided by the standard deviation.
Respectively, the value of the parameter “minimal amplitude” is specified as the
number of the standard deviations. This modification allows detection of interest-
ing peaks or pits from TS with relatively low values in comparison with other TS.
276 7  Visual Analytics Focusing on Space

Fig. 7.14  a Results of peak extraction are represented on a time graph. Each cross symbol rep-
resents a peak. b The matrix shows the temporal distribution of the peaks by the years (rows) and
months (columns). The degree of darkness is proportional to the number of peaks. The vertical
bar on the right represents the totals for the rows, and the horizontal bar below the matrix repre-
sents the totals for the columns

As with any computational method, the results obtained by the peak detection
method depend on the parameter settings. Andrienko et al. (2012) address the issue of
parameter sensitivity and suggest computational and visual techniques supporting the
investigation of parameter impact and informed choice of suitable parameter values.
To give an example of peak detection and investigation, we shall use the TS of
the presence of Flickr users over Switzerland transformed by removing the trends,
as was shown in Fig. 7.6. We apply the peak detection algorithm with the maxi-
mal time window width of five time steps and minimal peak amplitude 10, that is,
it will search where and when the number of place visitors increased by at least
10 compared to at least one of the two preceding time steps and at least one of
the two following time steps. The results of the peak detection are represented on
a time graph as shown in Fig. 7.14a. Each peak is represented by a cross sym-
bol positioned on the line of the TS in which it has been detected; the horizontal
position corresponds to the time step in which the peak occurred. When a line is
highlighted, the corresponding peak symbols change their colour from yellow to
green, to be better distinguishable in the display. The time graph view additionally
contains a linear event bar (below the plot)—a sequence of rectangles that show
the counts of the peaks for the time steps by the darkness of shading, darker is
more. A display called “periodicity chart” (Fig. 7.14b) shows the distribution of
the peaks with respect to temporal cycles, in our case, the yearly cycle composed
of months. Each row corresponds to one year and each column to one month of
the year. The vertical bar on the right of the display represents the totals for the
rows and the horizontal bar in the bottom, the totals for the column.
These visualizations tell us that peaks of the presence occurred in many places
in August, the maximum being 63 places in August 2010. Quite many peaks, but
fewer than in August, occurred in July, with the maximum of 26 places in June
2010. A quite probable reason may be that especially many visitors come in these
months to Switzerland for recreation. Although we know that many visitors come
also in winter, they may be quite busy with skiing and other winter sports and
7.2 Characteristics 277

Fig. 7.15  Peak events extracted from spatial time series are represented on a map (top) and in a
space–time cube by circles with the sizes proportional to the peak amplitudes

therefore not take many photographs. A surprisingly high number of peaks (38)
occurred in April 2011, which was, probably, related to the Easter holidays.
Each peak refers to a certain place (the area described by the TS in which the peak
occurred) and a certain time interval, which means that peaks can be treated as spatial
events. They can be visualized and analysed in the same ways as any other spatial
events. In particular, all methods described in the previous chapter may be applied
to peaks and, more generally, to any kind of time-limited features (e.g. sudden rises,
drops, or pits) extracted from spatial TS. Figure 7.15 demonstrates the representation
of the peak events extracted from the Swiss TS on a map and in a space–time cube.
The events are represented by circles (in red) with the sizes proportional to the peak
amplitudes. On the map, the circles for multiple events that occurred in the same
278 7  Visual Analytics Focusing on Space

Fig. 7.16  Summarized photograph titles for the peak events are represented in a text cloud dis-
play (bottom); the colours are propagated from the map (top)

place are overlaid. Since the circles are drawn in a semitransparent mode, the bright-
ness of the resulting colouring conveys the frequency of the event occurrences.
Furthermore, it is possible to obtain summarized texts of the photograph titles
for the peak events in the same way as was used in the previous chapter for spa-
tio-temporal clusters of elementary events. For each peak event, a database query
extracts all photograph records fitting within its spatial and temporal boundaries.
7.2 Characteristics 279

The titles of the photographs are summarized as described in Sect. 6.2.3. The text
summaries can be viewed using a text cloud display.
Figure 7.16 (bottom) shows the most frequent words and combinations occur-
ring in the titles of the photographs corresponding to the peak events. Many of the
words and combinations refer to real-world events attracting attention of many
people, including Flickr users: Gurtenfestival (a music festival in Bern), street
parades in Zurich, Arrancabirra (a kind of general public marathon in Courmayeur,
Aosta Valley, Italy), Auto Salon in Geneva, Tuning World Bodensee (an interna-
tional exhibition), World Economic Forum Annual Meeting in Davos, and many
others. Note that some of these real-world events did not show up when we sum-
marized photograph titles for spatio-temporal clusters of photograph-taking events
in Sect. 6.2.3. This reflects the fact that spatial events may have different scale in
space and time. The spatio-temporal clustering in Sect. 6.2.3 discovered concentra-
tions of photograph-taking events within tight spatial and temporal limits (250 m
distance in space and 30 min in time). Such concentrations are more usual to occur
in the case of local short-term events, like a street parade, but not in the case of
events extended in space and/or time, such as the World Economic Forum annual
meeting or an international exhibition lasting for several days.
This section shows that local TS may be investigated through extraction of spa-
tial events, similarly to extraction of movement events from trajectories (Sect. 3.5).
The extracted events, in turn, may be investigated using the methods suggested in
Chap. 6 and other suitable methods for gaining new knowledge about the move-
ment and its spatio-temporal context.

7.2.6 Interpretation of Personal Places

Semantic information about places extracted from movement data can be obtained
by comparing the coordinates of the places with locations of pre-defined places of
interest or objects from a geographical database (Parent et al. 2013). This informa-
tion can then be used for interpreting movement behaviours. For example, a daily
trajectory that starts and ends at a hotel, visits one or more tourist attractions, and
stops at a restaurant can be classified as a “tourist’s trajectory”. This approach,
however, is not suitable for interpreting personal POIs such as home, work, child’s
school or kindergarten, and regularly visited grocery.
In Sect. 7.1.4, we said that personal places can be characterized by several TS
derived from movement data: TS of visit counts by days and TS of visit counts by
times of the day (e.g. by hours) for the work days, Saturdays, and Sundays. These TS,
which can be called “temporal signatures” of the places, can allow inferring the likely
meanings of the places. Thus, places visited only in the work days from the morning
till the afternoon are, very probably, the workplaces. Places where a person appears on
all days of the week but on the work days mostly in the mornings and in the evenings
are, most likely, the home places. Places visited in the late afternoons or evenings of
280 7  Visual Analytics Focusing on Space

the work days and in the daytime on Saturday may be places for daily shopping. It
may also be possible to identify lunch places and places of regular activities.
For making illustrations, we shall use a subset of georeferenced Twitter data
(Sect. 2.10.6) referring to a selected urban area in the USA and a time period of
two months. We have constructed trajectories of the Twitter users from the posi-
tions of the messages posted on the selected territory. We have discarded the tra-
jectories of the individuals who were present on this territory for less than 10 days.
The remaining 2,558 trajectories are regarded as belonging to residents of the
selected area, who are likely to have repeatedly visited personal places.
We have extracted 4,245 personal places of these people by means of density-based
clustering with the spatial distance threshold of 100 m and minimum five neighbours
for a core point. The trajectories have been aggregated by the extracted places as
described in Sect. 7.1.4. Based on the TS of place visits by days, we have computed
for each place the number of different days it was visited. There are 1,488 personal
places (35.1 % of all) that were visited in 10 or more different days. In our experiment
on place interpretation, we focus on these places.
The Twitter data are episodic movement data, where the time gaps between the
position records may be quite large. For some time intervals by which the data are
aggregated, there may be no records; hence, the places where a person was present
in these intervals are unknown. Therefore, the values in the TS need to be consid-
ered as the lower bounds of the actual presence counts. A value decrease in a place
TS does not necessarily mean that the person moved somewhere else. The person
could stay in the place but not post new messages from this place in the next one
or more time intervals. This needs to be taken into account in interpreting the TS.
Figure  7.17 gives an example of temporal signatures of personal places of
one person. Four time graphs show the TS of place visits by hours on the work
days (a), Saturdays (b), and Sundays (c) and the TS of visits by days (d). The line
coloured in blue demonstrates a typical TS for a workplace: the person was pre-
sent there only on work days from hour 5 till hour 14. There are two lines col-
oured in red. The one reaching higher values may correspond to the home place

Fig. 7.17  The time graphs show temporal signatures of personal places of one individual. a, b, c
time series of place visits by hours of the day for the work days (a), Saturdays (b), and Sundays
(c); d time series of place visits by days
7.2 Characteristics 281

of the person, since the person is present there in the afternoons and evenings of
the work days and at any times in the weekend. The second red-coloured line has
a similar temporal distribution, but the values are very lower; hence, this is not
likely to be a home place. The two lines coloured in orange may correspond to
shopping places: they are visited in the afternoons and evenings of the work days
and in different times of the day on the weekend.
To be able to analyse a large set of places without considering the temporal sig-
natures of each place one by one, two approaches are possible: similarity analysis
of the TS and clustering of the TS. For both cases, a suitable distance function for
assessing the similarity of two TS is needed. In this case, the Euclidean distance
may not work well enough. It is reasonable to apply a more sophisticated distance
function that can perform moderate transformations of the TS: shifting, stretch-
ing/shrinking, and scaling. This is needed, in particular, to account for different
working times and, consequently, for different times of coming home and other
activities taking place on work days. For our illustrations, we estimate the similar-
ity between TS by means of the Euclidean distance; however, before computing
the distances, we apply temporal smoothing (Andrienko and Andrienko 2006) and
then transform the absolute values to normalized deviations from the means, as
described in Sect. 7.2.2.
For similarity analysis, the analyst selects a place with previously assigned
semantic interpretation and uses the distance function to compute the distances
between the TS of this place and those of all other places. Then, the analyst
applies interactive dynamic filtering by the distances and looks at the time graphs
to select a subset of sufficiently similar TS. The places these TS belong to can
be given the same interpretation as the exemplar place. An illustration is given in
Fig.  7.18. The time graphs in the upper row show the selected TS similar to the
workplace TS from Fig. 7.17. The TS belong to 154 different places, not counting
the exemplar place. In the lower row, the selected TS are similar to the home place
TS from Fig. 7.17. These TS belong to only 5 different places.
Figure  7.19 demonstrates selected results of clustering (we used progressive
clustering by k-means, as described in Sect. 7.2.3). The cluster presented in the

Fig. 7.18  By means of similarity analysis, places with temporal signatures similar to selected
ones have been found. Upper row likely workplaces; lower row likely home places
282 7  Visual Analytics Focusing on Space

Fig. 7.19  Temporal signatures of personal places have been clustered by similarity. The images
show three selected clusters

upper row (197 members) consists mostly of TS characteristic for workplaces. The
cluster in the middle row can be interpreted as a cluster of likely home places (74
members). The cluster in the lower row, probably, includes places of stops of pub-
lic transport (134 places), where people mostly appear in early morning hours of
the work days. It should be noted, however, that the clusters are not very “clean”.
Thus, the “work” cluster includes also a few TS with relatively high presence in
the evenings. The TS in the other clusters we have obtained are more difficult to
interpret, that is, the meanings of the place for the individuals cannot be guessed
from the shapes of the TS.
Another way of using information about the times of place visits for seman-
tic interpretation of personal places is suggested by Ahas et al. (2010). Personal
places extracted from mobile phone calls data are classified as home or work-
places based on the frequency of the person’s calls from each place, their average
time of the day, and the standard deviation of the time of the day. For this purpose,
the authors have developed a set of ad hoc classification rules.
Anyway, only temporal and statistical information about place visits may be suf-
ficient for deriving semantic interpretation of only a small proportion of personal
places. For the remaining places, it may be necessary to look at their relative geo-
graphical positions with respect to the other personal places and the relative times
of visiting with respect to the times of visiting the other personal places. For exam-
ple, a place visited on the way from home to work may be a place of child’s school
or kindergarten. Such more sophisticated analyses are not yet supported by visual
and computational techniques. There is still much to be done in this direction.
7.3 Relations 283

7.3 Relations

Discrete spatial and spatio-temporal aggregation of movement data produces not


only attributes characterizing places in terms of the presence of movers but also
data expressing and characterizing connectedness and flow relations between the
places. As defined in Sect. 2.8.2, a connectedness relation exists between two and
more places if the sets of their visitors overlap. The number of common visitors
can be taken as a measure of the strength of the connectedness relation. A flow
relation exists when two or more places are visited in a particular temporal order.
The number of movers that visit the places in this order or the number of times the
places are visited in this order is taken as the measure of the strength of the flow
relation, which is often called flow magnitude. The strengths of the relations may
change over time. These changes can be represented by TS of values of dynamic
attributes.
Spatial and spatio-temporal aggregation as described in Sect. 3.8 reveals only
binary connectedness and flow relations between places. When there is at least
one move from place pi to place pj, a new object is created, called connection, or
link. Each link is characterized by values of thematic attributes: number of moves
between its origin and destination, number of different movers, average time of
the move, average speed, etc. Aggregation by time intervals produces TS of these
attributes.

7.3.1 Analysis of Binary Links Between Places

Binary links between places and their attributes for one time interval or totals for
the whole time can be represented on flow maps. We have already many exam-
ples of flow maps throughout the book. Flow maps can represent movements of
a single object (Fig. 1.10) or multiple objects (Figs. 1.23, 1.25, 1.26, 3.9d, 3.10,
5.3, etc.). In Sect. 3.8, we have discussed the differences between flow maps sum-
marizing quasi-continuous and episodic movement data. In the latter, there are
usually many intersections between the flow symbols, which clutter the display
heavily and require the use of filtering.
Besides flow maps, binary links between places and attributes of the links can
be visually represented in an origin–destination matrix (Figs. 1.27 and 4.8) and
in a space–time cube (Fig. 4.5). The latter allows representation of time-variant
flows, but suffers from occlusions and intersections of symbols.
TS of attribute values associated with links can be visualized in analysed in the
same ways as TS associated with places. Figure 7.20 demonstrates a combination
of a flow map with time graphs. The images have been produced using the Milan
car data. The territory of Milan has been tessellated into 385 spatial compart-
ments. Since the trajectories are quasi-continuous, it is possible to represent them
by moves between neighbouring compartments. For the temporal aggregation, the
284 7  Visual Analytics Focusing on Space

Fig. 7.20  A flow map (a) representing links between places in a geographical context is com-
bined with time graphs representing time series of attribute values associated with the links: flow
magnitudes (b) and average movement speeds (c)

time span of the data (one week) has been divided into 168 hourly intervals. For
each ordered pair of neighbouring cells and each time interval, the aggregation
tool has computed the number of moves from the first to the second cell, the num-
ber of different cars, the average speed of the movement, and the average duration
of the move. Besides the TS, the tool has computed the total counts of moves and
cars and the average speeds and move durations for the whole time period.
The flow map on the left of Fig. 7.20 represents the total counts of the moves by
widths of the flow symbols; the flows with magnitudes below 100 are hidden. The
total number of links is 2,155, and 1,250 of them (58 %) have the total counts of
moves at least equal to 100. On the right of Fig. 7.20, there are two time graphs show-
ing the TS of move counts (b) and speeds (c) summarized by deciles. The vertical
lines separate the days; their positions correspond to the midnights. The first day in
the dataset is Sunday, and the last is Saturday. The thick black polygonal lines on the
time graphs represent the TS of the mean values. One link, which has been selected in
the map, is highlighted in black (on the west of the territory). The corresponding TS
are represented by thin black polygonal lines in the time graphs. This simultaneous
highlighting allows the variation in the attribute values for any link to be explored and
compared with the value distribution over the entire set of links. For the selected link
in Fig. 7.20, we observe dramatic drops of the average speed in the morning of the
work days (days 2–6). On Thursday (day 5), the period of very low average speed was
much longer than in the other days, specifically, from 5 till 16 o’clock.
A more scalable way of exploring characteristics of the links is through cluster-
ing by the similarity of their TS, analogously to clustering of places. An exam-
ple is presented in Fig. 7.21. The links between the places in Milan have been
7.3 Relations 285

Fig. 7.21  A flow map (a) represents links between places clustered by the similarity of the TS
of the move counts and average speeds. The flows are coloured according to their cluster mem-
bership. The colours are assigned to the clusters by means of the projection of the cluster centres
onto a two-dimensional colour space (b). The scatterplot (c) shows the value combinations of
move counts and average speeds occurring in different clusters. The time graph (d) shows the
envelopes of the clusters of the TS of move counts. The frequency histogram (e) shows the statis-
tical distribution of the average speed values for the clusters

clustered by similarity of their TS of move counts and average speeds (i.e. two
time-dependent attributes have been used together). The clustering has been done
by k-means with k = 15. The clusters have been assigned different colours by pro-
jecting their centres to a colour space, as explained in Sect. 7.2.3. The projection
plot is shown in Fig. 7.21b. It shows that clusters 4, 5, and 9 are far apart from all
others, which means that their TS differ much from the TS in the other clusters. In
the flow map (a), we see that the links from these three clusters, coloured in shades
of blue, are all located on the motorways. Generally, the patterns of the spatial
distribution of the cluster colours on the map are clear and easy to interpret. Thus,
prevailing shades of orange (clusters 1 and 13) differentiate the city centre from
the surrounding areas, where shades of yellow and light green dominate.
The scatterplot in Fig. 7.21c represents the combinations of values of the move
counts (horizontal dimension) and average speeds (vertical dimension) occurring
in different clusters. In the centre, the move counts are relatively high, while the
286 7  Visual Analytics Focusing on Space

speeds are low. As could be expected, high values of both attributes are attained
on the links located along the motorways, which belong to the blue clusters.
However, the maximal speed values occur not in these clusters but in yellow and
light green clusters when the move counts equal 1. The bulk of the speed values
for the links in these clusters are much lower, which can also be seen in the fre-
quency histogram in Fig. 7.21e. The histogram also shows that average speeds
higher than 160 km/h are very rare. Hence, it is very probable that the high speed
values just result from errors in the data.
The frequency histogram reveals a prominent bimodal distribution of the aver-
age speeds. It clearly shows that the links on the motorways are very different
from the links in the remaining parts of the city with ordinary streets. Each subset
of links has, in fact, its own distribution of speeds. The two different distributions
make together the particular histogram shape with two “hills”.
Colouring according to cluster membership can also be applied to the lines rep-
resenting TS on a time graph. However, such a time graph is not readable due to
overplotting. It can be useful only in combination with filtering, when the clusters
are selected one by one. In Fig. 7.21d, the lines are replaces by semitransparent
polygons. Each polygon encloses a group of lines belonging to one cluster and is
painted in the colour of the cluster. This technique makes the graph better read-
able, but occlusions still exist and complicate visual inspection. It is recommend-
able to select pairs of clusters for comparison by means of filtering.
Various transformations can be applied to link-related TS, analogously to place-
related TS. It is also possible to apply TS analysis and modelling methods, as
described in Sect. 7.2.4. Thus, Fig. 7.22 presents an example of building a model

Fig. 7.22  a The periodic variation in the move counts according to the daily cycle is represented
by a Holt–Winters model; the weekly cycle is ignored. b The variation is represented by a combi-
nation of Holt–Winters models with a separate model for each day of the week
7.3 Relations 287

to represent the variation in the hourly move counts over the week for a cluster of
links. In this case, the links have been clustered only based on the TS of the move
counts.
Like in the example considered in Sect. 7.2, the variation in the link charac-
teristics in the Milan data is periodic; hence, it is necessary to apply modelling
methods that can represent periodic variation, such as triple exponential smoothing
(Holt–Winters method). However, for the Milan data, two temporal cycles need to
be taken into account: daily and weekly. To our knowledge, there is no TS model-
ling method that can deal with two or more time cycles, or, at least, there is no
such method in the openly available libraries we have investigated. We suggest
two possible approaches to dealing with two cycles:
1. Ignore the larger (outer) cycle and build a model assuming that the data vary
only according to the smaller (inner) cycle. In our example, we would ignore
the weekly variation and consider only the daily variation.
2. Build a combination of models with a separate model for each position of the
smaller cycle within the larger cycle. Thus, for the case of the daily and weekly
cycle, separate models are built for Mondays, Tuesdays,…, Sundays, that is, the
variation is represented by a combination of seven models.
In Fig. 7.22a, approach (a) is used. As can be seen, the predicted daily variation
on Sunday (the first 24 time steps) does not fit well the real variation. Furthermore,
the prediction of the future values (i.e. beyond the time span of the data) is not
plausible. Figure 7.22b shows a model obtained with approach (b). The corre-
spondence to the input TS and the prediction of the future values are much better
than with approach (a).
One note should be made here. For building a model representing cyclic vari-
ation in the data, it is necessary to have an input TS with at least two full cycles.
For example, to build a model for each day of the week, we need data from at least
two Mondays, two Tuesdays, and so on. If the time span of the available data is
shorter, a longer input TS for the modelling method can be constructed either by
doubling the representative TS of the cluster or by concatenating several specific
TS selected from the cluster. In the latter case, the tool selects the TS that have
the closest values to the representative TS. The number of the specific TS to be
selected can be specified by the user. In Fig. 7.22b, 10 selected TS are represented
by blue-coloured lines.
Like in the case of place-related TS, the quality of the models built is assessed
by analysing the residuals, that is, the differences between the real values and the
model-predicted values.

7.3.2 Relations Between Link Attributes

When we compare the time graphs representing the TS of flow magnitudes


(Fig. 7.20b) and average speeds (Fig. 7.20c), we see that the speeds tend to decrease
at the times when the flow magnitudes increase. This kind of relationship is typical
288 7  Visual Analytics Focusing on Space

for constrained movement, where objects can move only through certain channels
with limited capacities, such as streets in a street network. The more objects wish
to move through a channel, the slower they are able to move. When data describing
constrained movement are spatially aggregated, each link between two places repre-
sents one or more real channels used for getting from the first to the second place.
Therefore, links can be treated as generalized channels. Their properties are similar
to properties of real channels; in particular, there is also a dependency between the
number of moving objects and their speed. This dependency can be captured by sta-
tistical models, such as linear or polynomial regression models.
For this modelling task, we do an additional transformation of the data. Two
time-dependent attributes A and B defined for the same time steps are transformed
into series of values of B corresponding to different value intervals of A. For this
purpose, the value range of attribute A is divided into suitable intervals. For each
interval and each object/place, the transformation algorithm finds all time steps in
which the values of A belong to this interval and collects the values of B attained
in these time steps. From the collected values of B, the algorithm finds the mini-
mum, maximum, mean, and quartiles. In this way, a family of attributes is derived:
minimum of B, mean of B, first quartile of B, median of B, and so on. For each of
the derived attributes and each object (i.e. link between places in our current case),
there is a sequence of values corresponding to the chosen value intervals of attrib-
ute A. These sequences are similar to TS except that the steps are based not on
time but on values of attribute A. We shall call these sequences dependency series
(DS) since they are meant to express the dependency between attributes A and B.
In this transformation, attribute A is treated as the independent variable and B as
the dependent variable. Owing to the structural similarity between TS and DS, a
time graph display can be suited to represent DS.
The time graphs in Fig. 7.23 represent the DS of the maximal average speed
depending on the number of moves per hour (a) and of the maximal number of
moves per hour depending on the average speed (b). To obtain the first set of DS,
we have divided the value range of the number of moves (from 0 to 69) into inter-
vals of length 3: 0–2, 3–5, 6–8, and so on; 23 intervals in total. To obtain the sec-
ond set of DS, we have divided the value range of the average speed (from 0 to

Fig. 7.23  The graphs represent dependencies between two attributes of the links: a depend-
encies of the maximal average speed on the number of moves; b dependencies of the maximal
number of moves on the average speed
7.3 Relations 289

Fig. 7.24  Examples of building models to express the dependency of the maximal average speed
on the number of moves per hour

Fig. 7.25  Examples of building models to express the dependency of the maximal number of


moves per hour on the average speed

190; the higher values have been discarded as outliers) into intervals of length 5:
0–5, 6–10, 11–15 km/h, and so on; 38 intervals in total. The lines on the graphs
have the colours of the clusters the links belong to.
Modelling of dependencies is done almost in the same way as modelling of
temporal variations except that temporal cycles are not involved. The analyst is
expected to select the range of the values of the independent variable that will be
used for building the model. This needs to be done separately for each cluster.
290 7  Visual Analytics Focusing on Space

Examples are shown in Figs. 7.24 and 7.25. The green and red vertical lines mark
the beginning and end of the selected subrange, respectively.
As can be seen from the examples, not all lines representing the DS have the
same horizontal extent. This is because there are many links where high values
of the independent variable are not reached, and hence, there are no correspond-
ing values of the dependent variable. Therefore, to build a dependency model for
a group, the analyst needs to select a subsequence of values of the independent
variable for which there are enough values of the dependent variable. An addi-
tional reason for limiting the value range for model building is the reliability of
the data. Thus, for the dependency of the speed on the number of moves, the first
value interval of the independent variable is from 0 to 2. The corresponding aver-
age speeds have been computed from movements of at most two cars; hence, the
values cannot be sufficiently reliable. It may be reasonable to ignore these values
in the course of dependency analysis and modelling, that is, exclude the first value
interval of the independent variable.
Modelling methods suitable for dependency modelling are linear regression and
polynomial regression. When polynomial regression is chosen, the analyst needs
to specify the order of the polynomial that will be generated. In our examples, the
dependencies of the speeds on the move counts are modelled by polynomials of
the order 4 and the inverse dependencies by polynomials of the order 5 and 6.
Like TS models, dependency models are evaluated by analysing model residu-
als (Andrienko and Andrienko 2013).
Figure  7.26 demonstrates the use of dependency models for prediction. The
set of models of the maximal average speeds depending on the move counts has
been used to predict the maximal average speeds based on the actual flow magni-
tudes by hours in the original TS. The predicted values also form TS defined for
the same sequence of time steps as the original TS. The TS are shown in a sum-
marized form by deciles in the upper part of Fig. 7.26. As can be seen, the charac-
ter of the temporal variation in the predicted values is the same as in the original
TS of the average speeds (Fig. 7.26 bottom), while the fluctuations have been

Fig. 7.26  Time series of predicted maximal average speeds (top) are compared with the TS of
actual average speeds (bottom). The TS are shown in a summarized form by deciles
7.3 Relations 291

reduced. The predicted values are generally higher than the original values. This is
explainable since the modelling has been built on the basis of the maximal average
speeds. We have also repeated the model building experiment for the means and
medians of the average speeds. The resulting models, when applied to the original
TS of flow magnitudes, also convey well the character of the temporal variation;
however, the predicted values are lower than in the original TS of average speeds.
The choice of the suitable attributes for dependency modelling may depend on the
analyst’s goal. Thus, the dependency models based on the maximal average speeds
can be used for estimating the required travel time for a given route depending on
the current or predicted traffic conditions.
Dependency models can also be used for traffic simulation. The current prac-
tice is based on the use of generic dependencies defined in traffic theory. With our
approach, it is possible to reconstruct specific local dependencies from historical
traffic data, which can potentially give more accurate predictions.

7.3.3 Relations Between Several Places

Flow maps are convenient for visualization of quasi-continuous movement data


aggregated into flows between neighbouring places. Aggregation of episodic
movement data, however, usually generates flows between non-neighbouring
places. The flows intersect and overlap in a map making it not readable. Even in
a case of quasi-continuous movement data, the analyst may be particularly inter-
ested in relations between non-neighbouring places of interest. Thus, in investigat-
ing the personal driving data, the analyst may wish to analyse the driver’s trips
between the personal places: home, work, places of regular shopping, places of
sports and recreation, etc. In such a case, the data are aggregated only by these
places. Since the relative spatial positions of these places may be arbitrary, the
aggregation may result in intersecting flows and in flow maps that are hard to use.
Besides problems with intersecting flows, a limitation of a flow map is that it
can show only binary links between places. It would be good to have a possibility
to look at relations involving more than two places.
When the number of places of interest is small and the places have expressive
labels denoting their meanings or geographical positions, it can be effective to vis-
ualize the places and links (flows) between them in an abstract space rather than
geographical space. One display dimension is used in this case for representing
the set of places. The other display dimension can be used for showing temporal
or ordering relations between the places in terms of the moves between them. This
idea can be implemented in various ways. One possible implementation is droplet
map, which is described below. The term “map” is used in an abstract sense and
not in the sense of cartographical map.
The design of droplet map consists of two or more parallel vertical axes and
circles or other shapes that are put on these axes one below another. The latter
elements of the design are metaphorically called “droplets”. Droplets are used to
292 7  Visual Analytics Focusing on Space

Fig. 7.27  The design of
droplet map display. Image
courtesy of David Spretke,
University of Konstanz,
Germany

represent places of interest. Flows between the places are represented in the fol-
lowing way. The origin places of the flows are put on one vertical axis and the
destination places on the next vertical axis. The same place may appear on both
axes if it is the origin for one flow and the destination for another flow. The order-
ing of the places on the axes is not necessarily the same. The origin and destina-
tion places are connected by lines, which may have different widths conveying the
flow magnitudes. The droplets representing the places can also differ in sizes to
represent place-related numeric attributes, such as the number of visits or the total
magnitude of the in- and/or outgoing flows. The design is schematically explained
in Fig. 7.27. The schematic drawing exemplifies the possible kinds of information
that can be perceived from this representation: (1) the transition from place a to
place b is highly frequented; (2) from place c, there are transitions to places c and
d, but place b itself is only reached from place a; (3) place a is frequently reached
from place c and less frequently from place d; and so on.
There are different possibilities for ordering the places along the axes. One pos-
sibility is to arrange them according to the droplet sizes, that is, place attributes.
Another possibility is to arrange the droplets in a way that minimizes the intersections
between the connecting lines. This can be done using, for example, the algorithm of
Sugiyama et al. (1981). The schematic figure shows a combination of these ordering
techniques: the places on the from-axis are ordered by the number of outgoing transi-
tions and then the places on the to-axis are ordered to minimize line crossings.
The number of axes in a droplet map display is not limited to two. By using
multiple axes, it is possible to represent temporal and ordering relations between
more than two places, or, in other words, the variation in the flows between the
places over time and the sequences of visiting the places. Respectively, there are
two possible meanings that can be assigned to multiple axes.
First, each axis may correspond to a certain moment in time. The axes are
arranged in the temporal order. The spaces between the axes correspond to time
intervals between the time moments represented by the axes. The connecting lines
7.3 Relations 293

Fig. 7.28  The droplet map display shows the links between the personal places of the car owner
over a day. The axes correspond to different times of the day. Image courtesy of David Spretke,
University of Konstanz, Germany

within each space represent the transitions that occurred during the respective time
intervals. The appearance of the droplets can be modified for representing changes
over time: the droplets can be vertically split into two parts so that the left part rep-
resents the total amount of incoming moves in the previous interval and the right
part, the total amount of outgoing moves in the next interval.
An example of a droplet map display where the axes correspond to time
moments is given in Fig. 7.28. The droplet map technique is applied to aggregated
data derived from the personal driving dataset (Sect. 2.10.1). The data have been
divided into daily trajectories (Sect. 3.2). The time references in the trajectories
have been transformed to the daily cycle (Sect. 3.3). In the introductory chap-
ter, we described how we extracted and interpreted the personal places of the car
owner. The transformed daily trajectories have been aggregated by the personal
places and 2-h time intervals using the breaks 6:30, 8:30, 10:30, and so on.
In the droplet map display, the axes correspond to the time breaks. The places
are represented by droplets having the form of half-circles. The connecting lines
representing the flows are differentiated by two colours: cyan is used for the flows
on the work days and orange for the flows on the weekend. The figure shows very
frequent transitions in the morning (8:30–10:30) and evening (16:30–20:30) from
home to work and from work back to home, respectively. The visiting patterns
to shops 1 and 2 emerge in this representation, showing that the visits during the
working days occurred mainly in the interval 16:30–18:30 and on the weekends
from 10:30 to 12:30. On the working days, the shops were visited after the work
and before going home. Shop 1 was occasionally visited in the morning before
the work. In the weekend, person went to the shops from home, more often in the
interval 10:30–12:30 than before 10:30. Either shop 1 or shop 2 was visited the
first, then the person often visited the other shop before going home. Sport was
294 7  Visual Analytics Focusing on Space

Fig. 7.29  The droplet map display shows the ordering relations between the visits of the per-
sonal places. The axes correspond to ordinal numbers of the places in the sequences. Image cour-
tesy of David Spretke, University of Konstanz, Germany

carried out only in the interval 8:30–12:30. The sport place was always reached
from home. It was followed mostly by going either to work (on the work days) or
back to home (both on the work days and in the weekend). There are also other
relations that can be seen in this display, which gives an easily readable while
informative visual summary of the routine personal movement behaviour. Even
more information can be obtained by interacting with the display. Clicking on a
droplet highlights all paths leading to and from the respective place. Clicking on a
connector highlights all paths in which it appears.
The second possible use of multiple axes in a droplet map display is to repre-
sent the sequence of place visiting, irrespectively of the time. In this case, each
axis corresponds to an ordinal number in a sequence: 0, 1, 2, and so on. The num-
ber of the axes equals the number of the visited places in the longest existing
sequence. This variant of the display is demonstrated in Fig. 7.29. It was also gen-
erated from the daily personal car trajectories.
The display tells us that almost all place sequences (i.e. person’s trips) start
from home; the few occasional starts from the work can be explained by incom-
pleteness of the data, where initial parts of some daily trajectories are missing.
Home–work–home is the most frequent sequence, which occurs on the work days.
The sequences home–work–shop 1–home, home–work–shop 2–home, and home–
sport–work–home are also quite prominent. Longer sequences are much less fre-
quent. After coming home, the person rarely went somewhere else. The longest
sequence consists of seven places and six moves. It can be traced by clicking on
the single droplet on axis 6. The unusual sequence home–shop 2–shop 1–home–
park–unknown–home occurred on a work day, but the person did not appear in the
workplace on that day. It was also not a public holiday, since the shops were, evi-
dently, open. It can be concluded that the person was on vacation.
7.3 Relations 295

Fig. 7.30  The personal places are represented by differently coloured segments of the ver-
tical bars; the heights are proportional to the counts of the place visits. The relations between
the places are represented by connecting bands with the widths proportional to the counts of the
moves (transitions). The colours of the connecting bands change gradually from the colour of the
origin place to the colour of the destination place

We have earlier mentioned that the idea of using one display dimension for rep-
resenting the set of places of interest and the other for representing time or sequen-
tial order can be implemented in various ways. We shall briefly describe another
possible implementation based on the method dynamic categorical data view
(DCDV) suggested by Bremm et al. (2011) and von Landesberger et al. (2012) for
visualization of changes in group membership or categorical attribute values over
time. The method is illustrated in Fig. 7.30.
The horizontal dimension of the DCDV display represents time. Similarly to
a droplet map display having vertical axes, a DCDV has vertical bars for differ-
ent time moments. The time moments can be selected from a larger set inter-
actively by the user or using one of several semiautomatic methods depending
on the current task. The selection of time moments will be discussed in more
detail in the next chapter. The vertical bars are divided into differently col-
oured segments representing different categories or groups. In our application,
the segments represent different places of interest. Hence, unlike the drop-
let map display exploiting place labels, DCDV relies on colour-coding of the
places. The heights of the segments are proportional to the counts of the place
visits. The segments of neighbouring bars corresponding to linked places are
connected by bands representing flows between the places. The widths of the
bands are proportional to the flow magnitudes, that is, counts of the moves.
In colouring of the bands, the colour of the origin place on the left is gradu-
ally transformed into the colour of the destination place on the right. Bar seg-
ments and connecting bands can have one of two possible states: plain (pastel
colour) or highlighted (bright colour). Mouse-clicking on a segment highlights
this segment, parts of segments on the other bars corresponding to places visited
296 7  Visual Analytics Focusing on Space

before and after the selected place, and parts of the connecting bands. The sizes
of the highlighted parts are proportional to the counts of visits and transitions
that occurred before and after visiting the selected place at the selected time. In
a similar way, clicking on a connecting band highlights all related places and
transitions. By means of these interactions, temporal relations involving several
places can be explored.
The example display in Fig. 7.30 is generated from the same data as the droplet
map displays considered before, except that the personal places have slightly dif-
ferent labels and one of the shopping places reflected in the droplet map display
was divided into two smaller places called Hit and Aldi/DM. The DCDV display
shows only the data subset from the work days. The selected time moments are
those when the largest numbers of transitions between the places occurred. The
highlighting corresponds to the selection of the segment corresponding to being in
the workplace (represented by the blue colour) at the time 10:00. The highlighting
shows that when the person was in the workplace at 10:00, in about 75 % of the
cases, he (she) was there already at 9:45 and in about 30 % of the cases already at
9:30, while coming before 9:00 was very rare. The person came to the work either
directly from home or after visiting (or passing by) the post office. In about 20 %
of the cases, the person left the work before 18:15, but in more than 50 % of the
cases, the person stayed at least until 18:30. From the work, the person most often
went directly to home, quite often visited one of the shops before going home and
occasionally visited two shops.
Generally, the droplet map and DCDV displays can convey the same kind of
information. The droplet map is, possibly, more intuitive, while the DCDV display
may better support the perception of quantitative information, that is, the amount
of place visits and transitions. The droplet map can support the comparison of two
sets of relations, such as relations on the work days and relations in the weekend.
The relations are represented in the same display using different colouring. This
would be impossible in a DCDV display. Two sets of relations can only be com-
pared by creating two DCDV displays. Hence, the designs are not fully equivalent;
some tasks are better supported by one of them than by the other.
In summary, when places of interest are few, temporal and ordering relations
between them can be explored by means of an interactive two-dimensional display
where one dimension is used for arranging places and the other dimension repre-
sents time or linear order. The relations are conveyed by connecting lines or bands
drawn between the representations of the places at different times or at different
positions in a sequence. This approach is suitable, in particular, for the investiga-
tion of relations between personal places of one or more individuals.

7.3.4 Discovery of Frequent Sequences

When places of interest are more numerous, ordering relations between them can
be investigated by means of sequence mining, also known as sequential pattern
7.3 Relations 297

mining (Mabroukeh and Ezeife 2010) or motif discovery (Ciriello and Guerra
2008); the latter term is mostly used in bioinformatics. Sequence mining algo-
rithms discover frequent subsequences, or motifs, in a database where all records
are sequences of ordered items. Some algorithms can deal with time-referenced
item sequences. Sequence mining is applied, for example, to DNA sequences,
sequences of customer purchases, and web log data. It can also be applied to
sequences of visited places. For this purpose, trajectories of moving objects need
to be transformed into sequences of place identifiers or labels. An algorithm will
treat the places just as symbols, that is, their geographical positions and neigh-
bourhood relations are not taken into account.
We shall demonstrate a possible application of sequence mining to move-
ment data using the Flickr data from Switzerland and the algorithm TEIRESIAS
(Rigoutsos and Floratos 1998), which was originally developed for the discovery
of frequent subsequences in biological sequences. To have more convenience in
the discussion of sequence mining results, we would like to use places of inter-
est having expressive labels. Since places of interest obtained from movement
data by the methods presented in Sect. 1.1 receive only automatically generated
identifiers, we take a different approach. We use the positions of the major cit-
ies in Switzerland and around as generating seeds for Voronoi tessellation (Okabe
et al. 2000). In addition, we have created several generating points in regions
where there are no cities and gave them the names of the regions. The labels of the
points (i.e. the names of the cities and regions) have been attached to the generated
Voronoi polygons (Fig. 7.31).
The goal of our example study is to investigate the typical movement behav-
iours of tourists in Switzerland, taking the Flickr users as representatives.
Specifically, we want to find frequently occurring sequences of visited places and
look at their characteristics, such as frequency, length, and presence of revisited
places. We are interested in sequences of at least three visited places, since binary
relations between places can be more easily investigated using flow maps and ori-
gin–destination matrices. We assume that tourists usually take photographs almost
every day of their stay in Switzerland. While several days of break may occur, for
example, due to bad weather conditions, we assume that a break with the length
of one week or more may mean that a person was not in Switzerland during this
time. Hence, we divide the movement tracks of the Flickr users in Switzerland by
a temporal gap of 7 days or more (Sect. 3.2) and obtain 116,941 trajectories of
34,141 persons. As we are interested in movements between quite large regions
and in sequences consisting of at least three items, we select only the trajectories
consisting of at least 3 points with the bounding rectangle diagonal length of at
least 25 km. Only 16,602 trajectories belonging to 7,927 persons satisfy these con-
ditions; the trajectories are represented by dark blue lines in Fig. 7.31.
Using the earlier generated territory tessellation, we obtain for each trajectory
the sequence of identifiers of the visited areas. The sequences are then passed to
the TEIRESIAS algorithm. The algorithm has several parameters, including min-
imal motif length (i.e. number of symbols in a subsequence, not counting wild-
cards) and minimal support (i.e. number of occurrences). Given the minimal motif
298 7  Visual Analytics Focusing on Space

Fig. 7.31  A Voronoi tessellation of the territory of Switzerland and surrounding areas has been
obtained using the positions of the main cities and region centres as generating points. The trajec-
tories of the Flickr users are represented by lines in dark blue drawn with 3 % opacity

Fig. 7.32  A text cloud display shows the most frequent sequences of visited places in
Switzerland and around. The texts are coloured according to the number of different places
occurring in the sequences; blue is used for two different places and red for three. One sequence
is highlighted in white (at the bottom); the highlighting is propagated to its subsequences
7.3 Relations 299

length of three and minimal support of five, the method finds 3,466 repeated
sequences of length from 3 to 26, of which 1,506 do not contain wildcards, 1,778
include one wildcard, 174 include two, 7 include three, and one includes four
wildcards. A wildcard is a special symbol (dot), indicating that any symbol may
occur in the corresponding position in the sequence.
Although some of the discovered sequences are quite long, the maximal num-
ber of different elements in a sequence, not counting wildcards, is only four. 2,165
sequences (62.5 %) contain only two different elements (i.e. places). The most fre-
quent sequences have the form A–B–A, that is, represent movement from place A to
B and then back to A, for example, Lugano—Gravedona—Lugano (occurs in 155
trajectories), Lausanne—Montreux—Lausanne (137 trajectories), Gravedona—
Lugano—Gravedona (136 trajectories), Zurich—Lucerne—Zurich (129 trajecto-
ries), and so on. In Fig. 7.32, the most frequent sequences are represented in a text
cloud display. The font size of a text is proportional to the number of trajectories
in which the sequence occurs. Among the visible sequences, the highest number
of trajectories is 155 and the lowest is 40. The texts are coloured according to the
number of different elements (places) in them: blue is used for sequences with two
different places and red for sequences with three different places. There are 1,290
sequences (37.2 %) with three different places, but they are much less frequent than
the sequences with two different places. Only 11 sequences (0.3 %) include four
different places, and these sequences occur in 5–7 trajectories.
To facilitate the investigation of the sequences in the spatial context, we trans-
form them to trajectories where the positions are the areas whose identifiers appear
in the sequences. All trajectories receive the same start time and equal time inter-
vals between the positions. This approach works sufficiently well for sequences
without wildcards, but it is unclear what spatial position could represent a wild-
card. Our current provisional solution is duplicating the previous position. The
trajectories can be represented on a map (Fig. 7.33) or in a space–time cube.
Characteristics of the sequences can be visually encoded by line widths and col-
ours. In Fig. 7.33, the line widths are proportional to the number of trajectories in
which the sequences occur and the colours encode the numbers of different places:
blue is used for two, red for three, and yellow for four different places.
To reduce overplotting in spatial displays of place sequences, the results of
sequence mining can be investigated by portions with the help of interactive filter-
ing. By means of a spatial filter, subset of sequences including particular places
can be selected. For example, in Fig. 7.34a, b, the map and space–time cube show
the sequences including Bern. The cube is rotated to be viewed from the east. In
Fig. 7.34c, d, the sequences including Bern and Zurich are shown in the same way.
The cube is viewed from the south-east. The vertical segments of the trajectories
in the space–time cube correspond to wildcards. The spatial filtering also affects
other displays. Thus, the text cloud display in Fig. 7.35 shows only the sequences
including Bern and Zurich.
In the space–time cubes (Fig. 7.34b, d), as well as in the text cloud dis-
play (Fig. 7.35), we see that there are many chains of various lengths consist-
ing of alternating moves Zurich—Bern and Bern—Zurich. Thus, the highlighted
300 7  Visual Analytics Focusing on Space

Fig. 7.33  The discovered frequent sequences of visited places are transformed to trajectories,


which are shown on a map by lines with the width proportional to the number of different tra-
jectories in which the sequences occur and coloured according to the number of different places

Fig. 7.34  The set of discovered frequent sequences of visited places is explored with the help of
spatial filtering. a, b: the map (a) and space–time cube (b) show the sequences including Bern.
c, d: the map (c) and space–time cube (d) show the sequences including Bern and Zurich
7.3 Relations 301

Fig. 7.35  The text cloud display shows the frequent place sequences including Bern and Zurich

sequence in Fig. 7.35 consists of four repetitions of the pair Bern—Zurich.


There are also many sequences with Bern and Zurich including wildcards. These
sequences represent the cases when people moved from one of the cities some-
where else and returned back before or after moving to the other city.
The example with Bern and Zurich shows quite typical patterns pertaining to
the whole dataset. Frequently occurring sequences of visited places include two
or at most three different places. It is typical for the Flickr users to move from one
place to another and then return back. Frequent sequences with three or more dif-
ferent places most often connect neighbouring places. The most connected are the
areas of Geneva, Lausanne, and Montreux, which are most often visited in this or
the opposite sequence (54 and 44 trajectories, respectively).
By this example, we have demonstrated how ordering relations between places
can be discovered by means of sequence mining and explored using visual dis-
plays and interaction techniques. An advantage of this analytical procedure is that
it is applicable to data with a large number of places of interest. An important fea-
ture of the procedure is that the sequences discovered by a sequence mining algo-
rithm can be represented by spatio-temporal constructs analogous to trajectories.
This gives the analyst an opportunity to view the sequences in the spatial context;
otherwise, they could only be analysed as texts.
In our example, we have applied sequence mining to place sequences generated
from episodic movement data, where consecutively visited places are not necessar-
ily neighbours in space. Like the original data, the trajectories generated from the
frequent sequences are very hard to visualize and explore due to unavoidable over-
plotting and numerous line intersections in spatial displays. To alleviate the prob-
lem, we have used interactive filtering. Sequence mining can also be applied to place
sequences generated from quasi-continuous movement data. In this case, consecutive
302 7  Visual Analytics Focusing on Space

places in a sequence are typically neighbours, and spatial displays of sequence min-
ing results look much better and are more convenient for analysis. We made an
experiment on applying TEIRESIAS to the Milan car trajectories (Sect. 2.10.2) gen-
eralized into place sequences using a fine tessellation of the Milan territory. The algo-
rithm was able to find much longer place sequences than in the case of the Flickr
data; however, the results were not very interesting since all frequent sequences
occurred on the motorways surrounding the city and on the major radial roads. In
fact, the results did not uncover new information in comparison with what we learned
through clustering the car trajectories according to route similarity (Sect. 5.1.2.2).
Perhaps, sequence mining can be specifically recommended for analysis of episodic
movement data, for which trajectory clustering often does not work.

7.4 Recap

Movement goes on in space. An important part of movement analysis is concerned


with answering questions about the relations of the movement to the space con-
taining it: How much movement occurs where at different times? What are the
movement characteristics in different locations and how they change over time?
How strongly does the movement connect different locations? In what order are
the locations visited?
While the space where movement goes on is usually continuous, there are seri-
ous practical reasons for limiting space-centred analysis to a discrete set of places
of interest: neither visual nor computational methods can deal with all existing
locations of continuous space. Places of interest may be selected by the analyst
from the locations occurring in the available data or may be defined by dividing
the space into compartments or by grouping and generalizing locations selected
from the available data. In particular, places of interest may be defined based on
locations of certain movement events. In studying movement behaviours of indi-
viduals, places of major interest are those where the individuals repeatedly appear.
When places of interest are selected or derived from movement data, they are
characterized by aggregating the related movement data, that is, counting the place
visits and visitors and computing the statistics of movement characteristics. This is
done for the whole time span of the data and by time intervals, where the time may
be treated as linear or cyclic. The aggregation also generates links between places
and characterizes them by the counts of moves and movers and statistics of move-
ment characteristics for the whole time span and by time intervals.
The TS of attribute values resulting from movement aggregation by places of
interest and time intervals can be analysed using interactive visual techniques sup-
ported by computational transformations of the data. A valuable visual tool for TS
exploration is time graph. A large number of local TS can be represented in a time
graph in summarized forms.
Clustering of the TS by similarity and presenting cluster membership on a map
not only uncover different patterns of local temporal variation but also allow the
7.4 Recap 303

analyst to see their spatial distribution. The set of local TS can be analysed by
means of established methods for TS analysis and modelling. The methods can
be applied to clusters of similar TS rather than separately to each TS. Interactive
visual interfaces facilitate model building, evaluation, and refinement.
Local TS may have abrupt rises (peaks) or drops, which may reflect important
features of the behaviours of the movers and/or impacts of the spatio-temporal
context on the movement. Computational methods can be used to extract peaks
or drops from a large number of TS. Since the TS are associated with spatial loca-
tions, the extracted features can be represented as spatial events and further ana-
lysed with the use of visual and computational tools suitable for spatial events.
Analysis of local TS related to personal places of interest, in particular, the var-
iation in the presence with respect to temporal cycles, can uncover the meanings
of the most important places of a person, such as home, workplace, and places of
regular shopping and recreation. Clustering and similarity analysis of TS help the
analyst to deal with personal places of multiple individuals.
Each object moving from place A to place B links place A to place B. The more
objects move from A to B, and the more times they do this, the stronger the link
between A and B is. As the movement varies over time, so the link strength does.
Binary links between places resulting from movement are characterized by TS of
the counts of the movers and their moves as well as by TS of summarized move-
ment characteristics, such as average speed and move duration. The links and their
characteristics can be visualized in flow maps combined with time graphs and
other displays. Clustering of the link-related TS by similarity plays the same role
as in the analysis of place-related TS. TS analysis and modelling methods sup-
ported by interactive visual interfaces can also be applied to link-related TS.
In some kinds of spaces, movements between places can only be done through
channels with limited capacities. Examples of such spaces are a street network in
a city, a road network on a larger spatial scale, and a system of doors and cor-
ridors in a building. When movement is constrained by the capacities of existing
channels, dependencies emerge between the number of moving objects and their
movement characteristics, in particular, speed and time needed to get from place
to place. Dependencies between attributes of links can be revealed by transforming
the TS of values of two attributes into DS of one attribute with respect to the other.
The dependencies can be represented by regression models. As in modelling TS,
the links are previously clustered by similarity of the DS. This not only reduces
the workload but also compensates for incompleteness and errors in the data and
facilitates generalization and abstraction from unimportant data fluctuations.
Besides binary relations, ordering and temporal relations involving more than
two places may be of interest. When places of interest are few, these relations
can be revealed and investigated by means of interactive visual displays, where
the places are represented in an abstract space. When places are more numerous,
frequently occurring sequences of visited places can be discovered by means of
sequence mining algorithms, also known as motif discovery. For these algorithms,
trajectories of movers are transformed into sequences of identifiers or labels of
visited places. The algorithms analyse these merely as sequences of symbols and
304 7  Visual Analytics Focusing on Space

return frequently occurring subsequences. To enable interpretation and explora-


tion of the results in the spatial context, an opposite transformation is performed:
the subsequences are transformed into trajectories. Sequence mining may be more
useful in analysis of episodic movement data than for quasi-continuous data.
As stated in Sect. 3.8, movement data summarized in the form of spatial TS
can be viewed from two perspectives: as a set of spatially distributed local TS and
as a temporal sequence of spatial distributions representing spatial situations. This
chapter relates to the local TS perspective. The next chapter will deal with the spa-
tial situation perspective.

References

Ahas, R., Silm, S., Järv, O., Saluveer, E., & Tiru, M. (2010). Using mobile positioning data to
model locations meaningful to users of mobile phones. Journal of Urban Technology, 17(1),
3–27.
Aigner, W., Miksch, S., Schumann, H., & Tominski, C. (2011). Visualization of time-oriented
data. Berlin: Springer.
Andrienko, G., & Andrienko, N. (2010). A general framework for using aggregation in visual
exploration of movement data. The Cartographic Journal, 47(1), 22–40.
Andrienko, G., Andrienko, N., Bremm, S., Schreck, T., von Landesberger, T., Bak, P., et al.
(2010). Space-in-time and time-in-space self-organizing maps for exploring spatiotemporal
patterns. Computer Graphics Forum, 29(3), 913–922.
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S.,& Wrobel, S. (2011). From movement
tracks through events to places: Extracting and characterizing significant places from mobil-
ity data. In Proceedings of IEEE Visual Analytics Science and Technology (VAST 2011) (pp.
161–170).
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., & Wrobel, S. (2013). Scalable analy-
sis of movement data for extracting and exploring significant places. IEEE Transactions on
Visualization and Computer Graphics, 19(7), 1078–1094.
Andrienko, G., Andrienko, N., Mladenov, M., Mock, M., & Pölitz, C. (2012). Identifying
place histories from activity traces with an eye to parameter impact. IEEE Transactions on
Visualization and Computer Graphics, 18(5), 675–688.
Andrienko, G., Andrienko, N., Rinzivillo, S., Nanni, M., Pedreschi, D., Giannotti, F. (2009).
Interactive visual clustering of large collections of trajectories. In Proceedings of the IEEE
Symposium on Visual Analytics Science and Technology (VAST 2009) (pp. 3–10). New York:
IEEE Computer Society Press.
Andrienko, N., & Andrienko, G. (2006). Exploratory analysis of spatial and temporal data: A
systematic approach. Berlin: Springer.
Andrienko, N., & Andrienko, G. (2011). Spatial generalization and aggregation of massive move-
ment data. IEEE Transactions on Visualization and Computer Graphics, 17(2), 205–219.
Andrienko, N., Andrienko, G. (2013). A visual analytics framework for spatio-temporal analysis
and modelling. Data Mining and Knowledge Discovery 27(1), 55–83.
Billauer, E. (2010). Peakdet: Peak detection using MATLAB. Online http://www.billauer.
co.il/peakdet.htm. Retrieved Feb 26, 2010.
Bremm, S., von Landesberger, T., Andrienko, G., Andrienko, N., Schreck, T. (2011). Interactive
analysis of object group changes over time. In Proceedings of the International Workshop on
Visual Analytics EuroVA 2011, Euro Graphics (pp. 41–44).
Ciriello, G., & Guerra, C. (2008). A review on models and algorithms for motif discovery in pro-
tein–protein interaction networks. Briefings in Functional Genomics and Proteomics, 7(2),
147–156.
References 305

Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E. (2008). Querying and mining
of time series data: Experimental comparison of representations and distance measures. In
Proceedings of the VLDB Endowment (Vol. 1(2), pp. 1542–1552).
Harrower, M., & Brewer, C. A. (2003). Colorbrewer.org: An online tool for selecting colour
schemes for maps. The Cartographic Journal, 40(1), 27–37.
von Landesberger, T., Bremm, S., Andrienko, N., Andrienko, G., Tekusova, M. (2012). Visual
analytics methods for categoric spatio-temporal data. In Proceedings of the IEEE Conference
on Visual Analytics Science and Technology (VAST 2012) (pp. 183–192). New York: IEEE
Computer Society Press.
Laurini, R., & Thompson, D. (1992). Fundamentals of spatial information systems. London:
Academic Press.
Mabroukeh, N.R., Ezeife, C.I. (2010). A taxonomy of sequential pattern mining algorithms. ACM
Computing Surveys, 43(1), 3:1–3:41, (Article 3).
Okabe, A., Boots, B., Sugihara, K., Chiu, S.N. (2000). Spatial tessellations—concepts and appli-
cations of Voronoi diagrams (2nd edn). London: Wiley.
Parent, C., Spaccapietra, S., Renso, C., Andrienko, G., Andrienko, N., Bogorny, V., et al.
(2013). Semantic trajectories modeling and analysis. ACM Computing Surveys (Vol 45(4))
(accepted).
Rigoutsos, I., & Floratos, A. (1998). Combinatorial pattern discovery in biological sequences: the
TEIRESIAS algorithm. Bioinformatics, 14(1), 55–67.
Schreck, T., Bernard, J., von Landesberger, T., & Kohlhammer, J. (2009). Visual cluster analysis
of trajectory data with interactive Kohonen maps. Information Visualization, 8(1), 14–29.
Sugiyama, K., Tagawa, S., & Toda, M. (1981). Methods for visual understanding of hierarchical
system structures. IEEE Transactions on Systems, Man, and Cybernetics, 11(2), 109–125.
Chapter 8
Visual Analytics Focusing on Time

Movers
Trajectories

Locations
Movement data Local time series
Spatial events
Spatial event data Spatial time series
Times
Spatial distributions

Fig. 8.1  This chapter addresses analysis tasks focusing on movement-specific characteristics


of time units and their relations to the context. Characteristics of time units are represented by
movement data in the form of spatial distributions (cf. Fig. 3.13)

Abstract  This chapter presents methods and procedures that can support m ­ ovement
analysis tasks focusing on time units (Fig. 8.1). Spatial situations characterize time
units in terms of the spatial positions and movement characteristics of the existing
moving objects. Spatial situations can be represented in an aggregated form by spa-
tial presence and flow distributions. When the number of spatial situations is large,
clustering by similarity is a suitable way to reduce the analytical workload. For
each cluster, a representative spatial situation is constructed or selected. A comple-
mentary method for analysing characteristics of spatial situations is extraction of
local features, such as local maxima or minima, and representing them by spatial
events, which may be then analysed by means of methods suitable for spatial events.
Quantitative changes between spatial situations can be analysed with the help of
change maps. Changes of object positions (displacements) can be visualized on flow
maps or by origin–destination matrices. A Dynamic Categorical Data View (DCDV)
display enables exploring object positions changes over multiple selected time units
when the number of different places is small or the places can be grouped into a
small number of place categories.

G. Andrienko et al., Visual Analytics of Movement, 307


DOI: 10.1007/978-3-642-37583-5_8, © Springer-Verlag Berlin Heidelberg 2013
308 8  Visual Analytics Focusing on Time

8.1 Characteristics

As stated in Sect. 2.3, time units may be characterized by spatial situations,


which consist of the spatial positions and movement characteristics of the exist-
ing moving objects. Being quite complex constructs, spatial situations are hard
to analyse in full detail. They are usually dealt with in aggregated form. Section
3.8 describes spatio-temporal aggregation, the results of which can be viewed
in two complementary ways: as a collection of local time series of attribute val-
ues in different locations (this view has been addressed in the previous chap-
ter) and as a sequence of spatial distributions in different time units. Spatial
distributions are aggregated representations of spatial situations. Instead of
the spatial positions of all existing moving objects in a given time unit, a spa-
tial distribution includes counts of object presence in different places and flow
magnitudes, that is, counts of objects that moved between different places and
counts of their moves. Movement characteristics of the objects are aggregated
into further attributes of places and connections (links) between places, such as
average speed, average stop duration, average time taken for getting from place
to place.
Since spatio-temporal aggregation of movement data produces aggregate
­attributes associated with two types of objects, places and links, a spatial situation
is represented by two complementary types of distribution, presence distribution
and flow distribution. A presence distribution is composed of the presence counts
and values of other aggregate attributes of the places. A flow distribution is com-
posed of the flow magnitudes and values of other aggregate attributes of the links.
These distributions can be viewed as two different aspects of spatial situations.
Spatial situations characterizing different time units can be visualized in
­animated maps or small multiple maps, where one animation frame or one small
map represents the situation in one time unit. Two aspects of a spatial situation can
be represented separately in different maps or together in the same map; however,
a map representing both aspects may be cluttered and difficult to read. Therefore,
it is often reasonable to visualize and analyse presence distributions and flow dis-
tributions separately. Figure 1.22 gives an example of small multiple maps repre-
senting presence distributions. The presence counts are represented by symbols of
proportional sizes. Other possible representations are colour coding and diagrams,
which can be used to represent values of two or more place-related attributes. The
small multiple flow maps in Figs. 1.23 and 1.26 represent flow distributions. The
flow magnitudes are represented by widths of flow symbols; it is also possible to
use colour coding. Flow distributions can also be represented by animated or jux-
taposed origin–destination matrices (Fig. 1.27).
Both animated displays and small multiple displays have quite limited analytical
power. Animated displays do not adequately support comparisons between spatial
situations in different time units. Small multiples can be effectively used for quite
a small number of time units. To study and compare characteristics of many time
units, we apply a universal approach: clustering.
8.1 Characteristics 309

8.1.1 Clustering of Times by Similarity of Spatial Situations

When time units are grouped (clustered) by similarity of the spatial situations, the
analyst can study and compare representative situations characterizing the groups
instead of all situations characterizing all time units. Hence, clustering can reduce
a large set of spatial situations to a manageable size. Like with any use of cluster-
ing, the analysis results may be affected by the parameters of the chosen clustering
method. The sensitivity of the results to the clustering parameters can be checked
by running the clustering method several times with different parameter values.
In clustering, the dissimilarity between the situations in two different time units
can be represented by the Euclidean or Manhattan distance between the values
referring to the same places or links in the two situations. A representative spatial
situation for a group of similar spatial situations can be constructed by taking for
each place or link the mean or median from the values referring to this place or
link in all situations within the group. Another approach is to select one of the situ-
ations included in a group as a representative situation. This should be the “cen-
tral” situation of the group, that is, the one with the smallest average dissimilarity
to all other group members.
In the previous chapter, we used examples of spatially and temporally aggregated
movement data for explaining and illustrating space-focused analysis methods. The
data were treated as local time series associated with places and links. The same
data can be treated as spatial distributions associated with time intervals. We shall
demonstrate the application of clustering of spatial situations (represented by spatial
distributions) by examples of the aggregated Flickr data from Switzerland and car
tracks from Milan.
Figure  8.2 gives an example of clustering of spatial presence distributions.
The clustering has been applied to the aggregated Flickr data from Switzerland.
We have already applied clustering to these data in Sect. 7.2.3. In that case, the
clustering was applied to the local time series, and we obtained groups of places,
such that the local time series of the places within the groups were similar. Now,
we apply the same clustering method (k-means) to the presence distributions,
where each distribution consists of the set of values (presence counts) associated
with different places in one time unit, in this example, one month. As a result, we
obtain groups of time units (months), such that the spatial distributions character-
izing the time units within a group are similar. Figure 8.2 presents the results of
k-means clustering with k = 11. Increasing the parameter value subdivides the
clusters where all values in the distributions are low. This does not change the
observed patterns.
The small multiple maps show the representative spatial situations (in the form
of presence distributions) for the time clusters. The representative situations have
been constructed by computing the mean value for each place over the situa-
tions included in each cluster. These values are represented by proportional sizes
of circle symbols. The backgrounds of the captions above the maps are painted
in the colours assigned to the clusters by projecting the cluster centres onto a
310 8  Visual Analytics Focusing on Time

Fig. 8.2  The monthly spatial presence distributions from the aggregated Flickr data from
Switzerland have been aggregated by similarity. The small multiple maps show the representa-
tive presence distributions of 11 clusters. The projection display with the background colouring
shows the relative distances between the cluster centres and the assignment of the colours to the
clusters. The time mosaic display in the lower right corner shows the distribution of the clusters
over the years (rows) and months (columns)

two-dimensional colour space, as was also done in the previous chapter. The
­projection display at the bottom of Fig. 8.2 beside the map shows the assignment
of the colours to the clusters as well as the relative distances between the cluster
centres.
By looking at the maps of the representative spatial situations, we mostly see
quantitative differences between them. The clusters coloured in shades of cyan
(1, 4, 6, and 9) are characterized by low presence values in all places. The lowest
values are in cluster 4 and the highest (but still low) values in clusters 1 and 9. In
the projection, these four clusters are located in the lower left corner, quite far from
the other clusters. Cluster 10 has noticeably higher values and is located close to
the centre of the projection plot, where it has got its greyish cyan colour. Cluster
11 (bright red) has the highest presence values among all clusters and is located in
the upper right corner of the projection plot, that is, it has the most distant position
from the cyan-coloured clusters with low values. The remaining clusters (2, 3, 5,
8.1 Characteristics 311

7, and 8) have higher presence values than in cluster 10 and lower than in cluster
11. In the projection plot, they are located between cluster 10 and cluster 11. The
differences between these intermediate clusters are difficult to see, as well as the
differences between clusters 1 and 9. We shall return to this problem a bit later.
To understand when the spatial situations from each cluster occurred, we can
use temporal displays. The distribution of the clusters with respect to temporal
cycles can be explored with the help of a time mosaic display, such as the one in
the lower right corner of Fig. 8.2. The display is composed of rectangular pixels
representing the time units (months in our example). The pixels are arranged in
rows and painted in the colours of the clusters the time units belong to. Each row
of pixels in our example (except for the last one) consists of 12 pixels representing
consecutive months; hence, each row represents one year. The last row is incom-
plete since the time span of our data set ends in August 2012. The columns of the
time mosaic display correspond to the 12 months of a year. In a general case, the
columns correspond to positions within some time cycle (daily, weekly, or yearly)
and the rows to consecutive time intervals of the length of one cycle (i.e. days,
weeks, or years). In such a layout, vertical alignments of pixels with similar col-
ours reveal periodic repetitions of similar spatial situations. In our example, we
see that the months 4–6 (April–June) of the years 2009–2011 were characterized
by similar spatial situations belonging to the same cluster 8. The situations in July
and August of the years 2009 and 2010 are from the same cluster 11. The overall
periodicity in the sense of occurrence of very similar spatial situations in the same
months of many years is not very strong. This is because the periodic variation
interacts with temporal trends, as we observed in Sect. 7.2.2.
The time clusters can also be represented on other temporal displays. On the
time graph in Fig. 8.3, the clusters are represented by background colouring. In
the graph area, the colours are desaturated, and in the label area below the graph,
the unmodified cluster colours are used. Here, we easily see that the cyan clusters
with low presence values occurred at the beginning and at the end of the time span
of the data, the bulk of the greyish cyan cluster with somewhat higher values is
positioned from mid-2006 to mid-2007, and the bright red cluster with the high-
est presence values occurred in summers of 2008, 2009, and 2010. The remaining
clusters are distributed within the time period from mid-2007 to August 2011. The

Fig. 8.3  The colours of the clusters of the spatial situations from Fig. 8.2 are propagated to a
time graph display
312 8  Visual Analytics Focusing on Time

quantitative differences between the clusters that we perceived from Fig. 8.2 are
consistent with the values in the time series visible in the time graph.
When discussing the small multiple maps with the representative situations of
the clusters, we said that it is hard to compare situations when quantitative differ-
ences between them are not very prominent. Thus, we can see that the presence
values in the clusters 2, 3, 5, 7, and 8 are intermediate between cluster 10 (greyish
cyan) and 11 (bright red), but the differences between these five clusters are not
clear. To facilitate comparisons, we can turn the original maps of the situations
into difference maps showing the differences of each situation from a selected ref-
erence situation. An example is shown in Fig. 8.4, where the representative situa-
tions of all clusters are compared to that of cluster 8, which has wine red colour.
For each cluster and each place, the arithmetic difference between the presence
value in the representative situation of this cluster and the presence value in the
representative situation of cluster 8 has been computed. The differences are rep-
resented by circle symbols in two colours: red for positive differences and cyan

Fig. 8.4  Small multiple maps show the differences of all representative situations to the repre-
sentative situation of cluster 8 (wine red). The circles in red and cyan show, respectively, positive
and negative differences. The sizes of the circles are proportional to the absolute values of the
differences. Absence of a circle corresponds to a zero value
8.1 Characteristics 313

for negative differences. The sizes of the circles are proportional to the absolute
values of the differences. For cluster 8, we do not see any circle on the map since
all differences are zeros.
The difference maps greatly facilitate comparisons with the selected situations.
Not only overall qualitative differences can be easily seen (e.g. we see that the values
in cluster 8 are overall higher than in the cyan clusters and lower than in the bright
red cluster) but also differences in spatial distributions of the values. Thus, cluster 7
(orange) has lower values than cluster 8 in the cities and flat areas but higher values
in the mountainous areas, in particular, in the Wallis (Valais) region on the south-west
and in the Grisons (Graubünden) region on the east. From the temporal displays, we
see that the situations of cluster 7 occurred in July and August of 2007 and 2011.
Evidently, in these periods, many people spent their vacations in the mountains. In
the years from 2008 to 2010, cluster 7 was replaced by cluster 11 (bright red), in
which the value differences in the mountainous areas are higher than in the other
places. This also tells us that many people came to these areas in July and August.
The situations in the purple and violet clusters (2 and 5), which occurred mostly in
winter months and in March, also have higher values than in cluster 8 in mountain-
ous areas. Hence, many people come to these areas also in winter, most probably, for
skiing and other winter sports; however, the values are not as high as in the summer.
In a similar way, we can compare all clusters to any other selected clus-
ters. This allows us to find, for example, that cluster 1, which occurred from
November 2011 till February 2012, differs from cluster 9, spanning from mid-
2005 to mid-2006, by higher presence values in and around Zurich and at Geneva
Lake. Clusters 2 and 5 do not noticeably differ in terms of the spatial distribution
­patterns, but the values in cluster 2 are somewhat higher than in cluster 5.
Clustering of spatial situations can also be done using other clustering
­algorithms. Self-organizing maps are used by Andrienko et al. (2010). Köthur et
al. (2013) apply hierarchical clustering (this paper deals with climate data rather
than movement data, but the approach is also applicable to spatial situations in
movement). The hierarchy of clusters can be interactively explored by drilling
down and rolling up.
Clustering can be applied not only to presence distributions but also to flow
­distributions. For the flows from the Flickr data, most of which have very low
magnitudes, we could not find interesting clusters and temporal patterns. We shall
briefly demonstrate clustering of flow distributions by example of the aggregated
Milan data.
Figure  8.5 shows the results of clustering of the hourly flow distributions in
Milan. We again applied k-means. After experimenting with different values of k,
we have chosen k = 14. Increasing the value of k does not significantly change
the results. The small multiple flow maps show the representative situations, in the
form of flow distributions, for the clusters. The representative situations have been
generated by computing the mean flow magnitudes (counts of moves) for the links
from the values occurring in the clusters. The mean flow magnitudes are repre-
sented by proportional widths of the flow symbols. Only the symbols representing
the mean flow magnitudes five or more are visible in the maps.
314 8  Visual Analytics Focusing on Time

Fig. 8.5  The hourly spatial flow distributions from the aggregated Milan cars data have been
clustered by similarity. The small multiple flow maps show the representative situations of the
clusters. The projection display with the background colouring shows the relative distances
between the cluster centres and the assignments of colours to the clusters. The time mosaic dis-
play shows the distribution of the clusters over hours and days within the week
At the bottom of Fig. 8.5, beside the last map, there is a projection display
showing the relative distances between the cluster centres and the colour assign-
ment to the clusters. As in the previous examples, the colours of the clusters are
used for background painting of the map captions. On the right of the projec-
tion display, there is a time mosaic display where the pixels represent the hourly
intervals by which the data are aggregated. The pixels are arranged in rows of the
length 24, that is, each row represents 24 h of one day. There are seven rows cor-
responding to the days of the week, from Sunday (top) to Saturday (bottom).
8.1 Characteristics 315

The time mosaic display reveals periodic temporal variation of the hourly
spatial situations over the work days and differences between the work days and
the weekend. The clusters coloured in shades of blue and violet occur in the night
hours and are characterized by low-flow magnitudes. The mornings of all work
days (hours 5–7) are very similar. In the later hours, there is more variability.
Like with presence distributions, comparison of representative flow distributions
can be facilitated by computing differences between the distributions. For example,
the small multiple flow maps in Fig. 8.6 show the differences of all representative

Fig. 8.6  Small multiple flow maps show the differences of all representative flow distributions
to the representative distribution of cluster 3 (raspberry red). The flow symbols in dark violet and
green show, respectively, positive and negative differences. The widths of the symbols are pro-
portional to the absolute values of the differences
316 8  Visual Analytics Focusing on Time

situations to that of cluster 9 coloured in raspberry red. The situations from this
cluster mostly occur in the mornings of the work days. More precisely, there are two
clusters with similar colours occurring in hours 6 and 7 of the work days: cluster 3
(bright red) and cluster 9. Cluster 3 occurs on Monday and Tuesday and cluster 9 in
the other work days, except hour 7 on Thursday, when cluster 3 occurs again. The
similarity of the colours and the closeness of the cluster centres in the projection plot
(Fig. 8.5) indicate that clusters 3 and 9 have similar flow values.
The differences to cluster 9 are represented in the small multiple maps by flow
symbols in two colours, dark violet for positive differences and green for negative
differences. The widths of the symbols are proportional to the absolute values of
the differences. Only the symbols for the differences with the absolute values five
or more are visible in the maps. The maps show us that cluster 3 differs from clus-
ter 9 by higher flows towards the north and north-east on the belt motorways and
lower flows to the south. Later in the morning (cluster 10, dark orange), there is more
movement from the north towards the city centre. In the afternoons (cluster 11), there
is more movement out of the city centre and towards the north-west and north-east
than in the morning.
The flow distribution in hour 16 on Tuesday stands apart from the other flow
distributions and makes a separate cluster (cluster 12) coloured in green, which
is located far from the other clusters in the projection plot. This distribution has
higher flow magnitudes than in the other distributions almost everywhere, except
for the flows along the northern motorway towards the east. This can be more eas-
ily seen after subtracting the flow distribution of cluster 12 from the representative
flow distributions of the other clusters. In an attempt to explain this unusually high
traffic, we searched in the web for events that could happen in Milan on Tuesday,
3 April 2007 and found that this was the day when a Champions League quarter-
final football match between AC Milan and Bayern Munich took place in Milan at
the stadium Giuseppe Meazza, also known as San Siro. The match was attended
by 77,700 spectators. We are not quite sure that this explains the unusual traffic
in hour 16, which may be 18 in local time, assuming that the times in the data
are Greenwich mean times. We checked whether the event is somehow reflected
in the presence distributions. Although we noticed an increase of car presence in
two areas close to the stadium in hour 17, clustering of the spatial presence dis-
tributions does not separate the whole distribution of this hour from the other
distributions.
We shall not demonstrate and discuss in detail the clustering of the presence
distributions from the Milan data. The analysis procedure is the same as was
applied to the presence distributions from the Flickr data and to the flow distribu-
tions from the Milan data. The clustering results reveal periodic temporal patterns
consistent with the patterns for the flow distributions.
Andrienko et al. (2012) describe an example where clustering of spatial pres-
ence and flow distributions derived from the same data reveals different temporal
patterns, which give complementary information for understanding of collective
movement behaviours. In that example, the data about visitors of car races (intro-
duced in Sect. 2.10.10) have been aggregated temporally by 30-min time intervals
8.1 Characteristics 317

and spatially by the places where the measurements were taken. The time period
of the data is two full days; hence, there are 96 time intervals characterized by spa-
tial presence and flow distributions. Figure 8.7 presents two time mosaic displays
where the pixels representing the time intervals are arranged linearly. The mosa-
ics show the results of the clustering of the time intervals by the similarity of the
respective presence distributions (upper row) and flow distributions (lower row).
A white vertical line separates the two days during which the data were collected.
By coincidence, in the two sets of clusters, shades of blue correspond to early
morning, late evening, and night times when there were no or very few people
in the race area. In these times, both the presence counts and the flow magni-
tudes were very low. In the upper row, we see pixels coloured in shades of orange
located near the middles of the days 1 and 2. At these times, the presence distribu-
tions were much dissimilar to all other times during the two days. These were the
times of the qualifying race on day 1 (from 14:00 till 15:00) and main race on day
2 (from 13:30 till 15:30). The corresponding pixels in the lower mosaic are painted
in shades of blue, like in the night, which means that the movements of the people
during these times were minimal.
Figure 8.8 shows the representative spatial situations for selected clusters of
presence distributions (upper row) and flow distributions (lower row). Images 3
and 4 in the upper row correspond to the times of the qualifying race on day
1 and main race on day 2, respectively. We see high presence of people in the
places from which the races could be watched and low presence in the other
places. Images 2 and 3 in the lower row represent the flow distribution dur-
ing the races. Image 2 corresponds to the time of the qualifying race on day 1
and the first 30 min of the main race on day 2, and image 3 corresponds to the
remaining time of the main race. We see that the flow magnitudes were very
low during the qualifying race and yet lower during the main race. As could be
expected, very few visitors moved around during the races; most visitors pre-
ferred to watch the races.
In the upper time mosaic in Fig. 8.7, we see a kind of symmetry in both days
with respect to the times of the races, that is, the presence distributions after the
races were similar to the presence distributions before the races. Image 2 in the
upper row corresponds to the times before and after the races on both days. There
were fewer visitors on the tribunes and more visitors in the information centre
and at other attractions. The situations represented by image 1 occurred chrono-
logically before and after the situations represented by image 2. At those times,
the visitors were more evenly spread over the places of interest and the presence
counts in all of them were not very high.
The colour patterns in the lower time mosaics in Fig. 8.7 are asymmetric,
which means that the flow distributions before and after the races substantially dif-
fered. Image 1 in the lower row of Fig. 8.8 represents one of the clusters of flow
distributions that occurred before the races (directly before the main race on day 2
and 30 min before the qualifying race on day 1), and image 4 represents the cluster
of flow distributions that occurred directly after the races on both days. It is easy
to see that the clusters are characterized by opposite directions of the major flows.
318 8  Visual Analytics Focusing on Time

Fig. 8.7  Two time mosaics displays with linearly arranged pixels represent the time clusters
obtained by clustering of the presence distributions (upper row) and flow distributions (lower
row) derived from the data about visitors of car races. The white vertical line separates two days

Fig. 8.8  The maps show representative situations for selected clusters of presence distributions
(upper row) and flow distributions (lower row). Images 3 and 4 in the upper row correspond to
the times of the qualifying race on day 1 and main race on day 2, respectively. Image 2 in the
lower row corresponds to the times of the qualifying race and first 30 min of the main race and
image 3 in the lower row to the remaining time of the main race

Before the races, the strongest flows were directed towards the tribunes. After the
races, the people mostly moved to the exits and parking places.
It can be noticed from Fig. 8.7 that the collective movement behaviour before
the qualifying race (day 1) was more variable than that before the main race
(day 2): there are three different clusters of flow distributions before the qualify-
ing race and only one cluster before the main race. Also, the movements after
the races are represented by two clusters on day 1 and one cluster on day 2. A
detailed examination of the differences between the clusters is described in the
paper by Andrienko et al. (2012). As we found out, the car pilots practised on
the racing circuit before the qualifying race, and, evidently, some part of the visi-
tors watched the practicing. After the qualifying race on day 1, there were quite
8.1 Characteristics 319

many people moving between the places of exhibition, information, and shop-
ping, while after the main race on day 2, the visitors mostly moved to the exits
and parking places.
Hence, by combining information obtained through clustering of presence
and flow distributions, we can build a complete story describing the collective
movement behaviour of the visitors of the car races. In a general case, presence
distributions and flow distributions provide complementary information for char-
acterization of movement behaviour.

8.1.2 Event Extraction from Spatial Situations

In a spatial presence distribution, there may be places where values of a numeric


attribute, such as presence counts, are significantly higher or lower than the val-
ues in all places around them or than the average values in their neighbourhoods.
Some of these local features may exist always or almost always; others may
appear only in some time units. Spatial features that have limited time of existence
can be treated as spatial events. Such events can be extracted from a sequence of
spatial distributions and then explored and analysed using any methods applicable
to spatial events.
Extraction of spatial events from spatial distributions is conceptually analo-
gous to extraction of spatial events from local time series (Sect. 7.2.5). In the case
of local time series, we were interested in values significantly higher or lower
than other values in their temporal neighbourhood. In the case of spatial distribu-
tions, we are interested in values that are significantly higher or lower than other
values in their spatial neighbourhood. In the case of local time series, the tem-
poral neighbourhood of a time unit t is defined by a given width of time win-
dow around t. In the case of spatial distributions, the spatial neighbourhood of a
place p may be defined as the places located within a given spatial distance from
p or, when dealing with compartments of a space tessellation, as the places having
common borders with p. For a spatial tessellation, spatial neighbourhood can be
extended to include second, third, or even higher-order neighbours. Second-order
neighbours of place p are the places neighbouring to the immediate neighbours of p.
Third-order neighbours are the places neighbouring to the second-order neigh-
bours, and so on.
Extraction of events from a single spatial presence distribution is based on
comparing the attribute value in each place p to a certain statistical summary of
the attribute values from the spatial neighbourhood of this place p. The statistical
summaries that may be reasonable to use include minimum, maximum, median,
average, and weighted average. In computing the median or average, the value in
place p itself may be included or excluded, while in finding the minimum or maxi-
mum, the value in place p is not taken into account. In computing the weighted
320 8  Visual Analytics Focusing on Time

average, the contribution of each place is weighted by the distance to place p: the
smaller the distance, the higher the weight.
An event may be generated when the difference between the value in place p
and the summary value from the neighbourhood is not less than a chosen positive
difference threshold dMin+, if we are interested in local maxima, or when it is
not higher than a chosen negative difference threshold dMax−, if we are inter-
ested in local minima. The spatial position of the event is place p, and the tempo-
ral position is the time reference of the spatial situation from which the event is
extracted.
However, some places may always or almost always have higher or lower val-
ues than in their neighbourhood. When each spatial situation is considered sepa-
rately from all others, spatial events for these places will be generated from all or
almost all spatial situations. Strictly speaking, these extracted objects cannot be
treated as events since they represent permanently existing features. Hence, we are
interested in extracting only those local maxima or minima that occur in a rela-
tively small subset of spatial situations. This can be achieved using the following
approach. We choose the maximal proportion (percentage) pMax of the spatial
situations in which the spatial maxima or minima may occur. For each place, we
compute the (100-pMax)th percentile of the differences between the values in this
place and the statistical summaries from the place neighbourhood over all situa-
tions. For example, if the chosen maximal proportion is 5 %, the 95th percentile
is computed. In extracting local maxima events, an event in place p in time unit ti
is generated only when the value difference to the neighbourhood in this time unit
is not less than dMin+, while the (100-pMax)th percentile is lower than dMin+.
Analogously, in extracting local minima events, an event in place p in time unit ti
is generated only when the value difference to the neighbourhood in this time unit
is not more than dMax-, while the (100-pMax)th percentile is higher than dMax-.
This approach ensures extracting only events that occur not more than in pMax %
of the spatial situations.
The algorithm for extracting spatial events from a temporal sequence of spatial
situations can be formally described by the pseudo-code given below.
8.1 Characteristics 321

The pseudo-code above describes the extraction of local maxima. For extract-
ing local minima, line 9 is replaced by the line

and line 11 is replaced by the line

In line 12, the algorithm generates a spatial event with the spatial position pi
and temporal position tj. The difference of the value in the place pi to the summary
value from the neighbourhood is attached to the event as an attribute value, which
may be treated as the event magnitude.
We shall demonstrate the work of the algorithm by example using the Flickr
data. For this example, we limit the area under analysis to the territory of Zurich.
However, we use data from a larger territory, to ensure that each place within the
area of interest is surrounded by neighbours. We tessellate the territory into com-
partments of variable sizes, as described in Sect. 7.1.1, using a sample of the data.
Then, we aggregate the trajectories of the Flickr users by the cells of the tessella-
tion and time intervals of the length of one week (7 days). The time intervals are
specified, so that each interval begins on Monday and ends on Sunday. The aggre-
gation gives us 401 weekly presence distributions consisting of the weekly counts
of the place visitors, where the places are the cells of the territory division.
322 8  Visual Analytics Focusing on Time

We define the spatial neighbourhood of each cell as the set of its first-order
neighbours, that is, the cells bordering on this cell. We want to extract spatial
events based on the differences between the presence counts in the cells and the
average presence counts in their neighbourhoods; that is, we apply the statistical
operator that computes the average presence value over the place neighbourhood,
excluding the place itself. We compute the differences to the neighbourhoods for
all places and time units; the resulting values range from −4 to 17.4.
We want to extract the spatial events where the presence counts in places
exceed the average presence counts in the neighbourhoods by at least 5; that is,
we set the positive difference threshold dMin+ to 5. We set the maximal propor-
tion threshold pMax to 10; that is, only the places where the high differences to
the neighbourhood occur in not more than 10 % of the time units are taken into
account.
With these settings, the event extraction algorithm extracts 45 spatial events of
local maxima that occurred on the territory of Zurich. From 186 cells covering the
territory of Zurich, there are 27 cells containing the extracted events. In Fig. 8.9,
the events are shown on a map (left) and in a space–time cube (right). In the map
representation, the sizes of the circle symbols representing the events are propor-
tional to the event magnitudes, that is, the differences of the values in the places
from the average values in their neighbourhoods. The largest event magnitude is 12.
The earliest event occurred in the week starting from 2 July 2007 and the latest in
the week starting from 16 October 2011.
Figure  8.10 presents examples of two spatial situations from which spatial
events of local maxima have been extracted. The maps show only the central part
of Zurich. The places (i.e. cells of the territory division) are painted in shades of
brown proportionally to the counts of place visitors. White colour corresponds to

Fig. 8.9  Spatial events of local maxima extracted from spatial situations are shown on a map
(left) and in a space–time cube (right)
8.1 Characteristics 323

Fig. 8.10  The maps represent two spatial situations from which spatial events of local maxima
have been extracted. The places are shaded proportionally to the counts of the visitors

zero values, and the maximal darkness corresponds to 10 visitors (although the
maximal number of place visitors over all situations is 18, we have limited the
value range represented by the shading in this illustration to make the currently
used shades better distinguishable). The extracted events are represented by cir-
cles, which are drawn with very high transparency to enable seeing the underlying
information layers. The images represent the presence distribution in the weeks of
8–14 August 2011 (left) and 10–16 October 2011 (right).
Like we did in Sect. 7.2.5 for spatial events extracted from local time series, we
try to interpret the events extracted from the spatial situations using the texts from
the photograph titles. The titles of the photographs with the spatial and temporal
references lying within the spatial and temporal limits of the events are extracted
from the database and summarized by finding the frequent words and word combi-
nations. The text summaries could be obtained only for 26 spatial events out of 45.
In Fig. 8.11, the summaries are represented in a text cloud display. The texts are
ordered by the times of the event occurrence and coloured according to the years
of occurrence; the font sizes are proportional to the text frequencies in the photo-
graph titles. Note that the text cloud display may include several texts referring to
the same spatial event.
From the summaries of the photograph titles, we find out that four spatial
events are related to the street parades that took place in Zurich in the years 2007,
2008, and 2011. In the latter year, two parade-related spatial events have been
extracted from two neighbouring places. These two events are visible on the left of
Fig. 8.10. All street parades took place in August. In June 2009, there was another
parade called Zurich Pride.
Other interesting events reflected in the Flickr photograph collection are
Lethargy Festival, which took place in the week 4–10 August 2008 at a site called
324 8  Visual Analytics Focusing on Time

Fig. 8.11  Summarized photograph titles for the extracted local maxima events are represented in
a text cloud display

Rote Fabrik, Madonna concert in the week 25–31 August in district Dübendorf,
demonstration of the Bleiberecht movement for equal human rights in the week
29 December 2008–4 January 2009, first time appearance of the airbus A380 in
Zurich airport in January 2010, concerts of the rock groups U2 and One Republic
at the Letzigrund stadium in September 2010 (the name of the group U2 was not
included in the text summaries because tokens containing digits are ignored by the
summarization procedure), a performance “Mirror of my Soul” (“Spiegel meiner
Seele” in German) by Alessandro Cipriano (a tenor singer) in August 2011, and a
protest action “Occupy Paradeplatz” in October 2011, a part of the global Occupy
movement. The extracted spatial event corresponding to the latter public event is
represented on the right of Fig. 8.10.
The text cloud display also contains the name Blanka Vlašić, a Croatian ath-
lete specializing in the high jump. The event represented by this text occurred in
the week of 3–9 September 2007 at the Letzigrund stadium. By web searching,
we find out that one of the annual international athletic events Weltklasse Zürich
(World Class Zurich) took place on 7 September 2007. Evidently, Blanka Vlašić
participated in this sport event.
The text cloud display also contains texts that do not refer to real-world pub-
lic events but rather to static spatial objects, such as Grossmünster, statue of Hans
Waldmann (a mayor of Zurich in fifteenth century), and Sheraton hotel. Hence,
increased presence of Flickr users in some place and time may not necessarily cor-
respond to an interesting happening but may be just occasional.
This example demonstrates that characteristics of spatial presence distribu-
tions can be analysed by extracting spatial events, such as local maxima or min-
ima, and analysing the spatio-temporal distribution of the events as well as other
event-related information, when available. In principle, the same procedure can
be applied to spatial flow distributions. The only difference is defining the spatial
8.1 Characteristics 325

neighbourhood of a link. A possible definition is the set of links consisting of the


incoming links of the origin place and the outgoing links of the destination place
of a given link.

8.2 Relations

In Sect. 8.1.1, we compared representative spatial situations of time clusters by


subtracting place- or link-related numeric values of one situation from the corre-
sponding values of another situation. The resulting numeric differences were rep-
resented on difference maps (Figs. 8.4 and 8.6). The same approach can be applied
to spatial presence and flow distributions characterizing individual time units: to
see how the presence or flows changed from time t1 to time t2, the spatial distribu-
tion for t1 is subtracted from the spatial distribution for t2. In cartography, there is
a concept of change map (Slocum et al. 2009): a change map explicitly shows the
changes that took place between two points in time. For movement data aggre-
gated into continuous density fields, Lampe et al. (2010) provide an interactive
interface for generating change maps for selected time intervals, while Scheepens
et al. (2011) apply map algebra operations. An example of change maps represent-
ing numeric changes in discrete locations is given in Fig. 8.12.
From the races data set, we have generated a display with ten small multiple
change maps corresponding to the time intervals from 12:00–12:30 to 16:30–17:00
on the second day and showing the changes in the people presence with respect to
the previous intervals. The changes are shown by bar diagrams. The upward-oriented

Fig. 8.12  Small multiple change maps show numeric changes in people presence in time inter-
vals from 12:00–12:30 to 16:30–17:00 on the second day with respect to the previous time inter-
vals. Upward-oriented red bars represent increases and downward-oriented cyan bars decreases
326 8  Visual Analytics Focusing on Time

red bars represent increases, and the downward-oriented cyan bars represent
decreases. The lengths of the bars are proportional to the amounts of the changes.
We can observe that in time intervals starting from 12:00 to 12:30, the presence of
people on the tribunes gradually increased with respect to the previous intervals. In
the next interval, almost no changes happened. In the intervals starting from 13:30
to 14:00, the presence of people in the places other than the spectator tribunes sub-
stantially decreased. Evidently, people moved to places from which the race could
be observed. Surprisingly, these changes happened after the start of the main race
at 13:30, that is, some people were still moving after the race started. There were
almost no changes of the presence at the sensors installed on tribunes. This can be
explained by the low spatial coverage of the data. There were only three Bluetooth
sensors installed on tribunes. Each of them could sense the presence of Bluetooth
device carriers within a radius of about 20 m while the number and total area of all
places from which the race could be observed are much bigger. The following two
intervals starting from 14:30 to 15:00 were very quiet: no changes occurred as the
people were watching the race. In the next interval 15:30–16:00 (immediately after
the race finished), the presence increased almost everywhere, which means that the
people began to move actively. In the next two intervals, the presence first greatly
decreased on the tribunes and then everywhere.
This kind of change maps can show changes of numeric values associated with
places or links. In analysis of movement data, it may also be important to see the
changes in terms of where the objects have moved from one time unit to another.
Displacements of individual movers from time t1 to time t2 can be shown on a map
by straight lines, arrows, or fragments of trajectory lines connecting the position
of each mover in time t1 to the position in time t2. This technique may be limited

Fig. 8.13  Coherent changes of spatial positions of multiple movers can be easily noticed on a


map
8.2 Relations 327

regarding the number of movers and, more importantly, the degree of coherence
of their movements. When multiple movers change their positions in an incoher-
ent way, the map display will be cluttered by many intersecting lines. However,
spatio-temporal trends (Sect. 2.8.2), that is, coherent changes of positions of
many movers, such as movement in the same direction, convergence, divergence,
may be well visible. An example is shown in Fig. 8.13. The map has been cre-
ated using the data about the group walk (Sect. 2.10.5). By means of time filter
(Sect. 4.2.1), we have selected a short-time interval. In response, the map shows
position changes that occurred between the moments of the beginning and the end
of this time interval. Since the group members mostly walked together along the
same path, the fragments of the trajectory lines form an easily perceptible bun-
dle that clearly shows how the positions of the group as a whole and its members
changed between the selected time moments. This simple approach would be less
effective if each group member moved arbitrarily; however, it would be possible to
see the incoherent character of the changes.
Displacements of multiple movers with spatially coinciding or close origins
and destinations can be aggregated into flows and represented on a flow map.
However, this may not radically reduce the clutter since there may be many inter-
secting or overlapping flow symbols. When places of interest are not very numer-
ous and have descriptive labels, changes of object positions from one time unit to
another can be explored using an origin–destination matrix, as in Fig. 1.27.
For a small number of places, the relations between time units in terms of
changes of movers’ spatial positions can also be explored with the help of a DCDV
display, which has been introduced in Sect. 7.3.3. The technique is described in
papers by Bremm et al. (2011), von Landesberger et al. (2012). An example can be
seen in Fig. 8.14. To remind, the main part of a DCDV display (Fig. 8.14a) consists
of vertical segmented bars connected by bands. The bars correspond to different
time units, which are chronologically ordered from left to right, and the segments to
different places or, as in Fig. 8.14, categories of places. Each place or place category
is represented by a unique colour; the legend in Fig. 8.15a explains the colours used
in Fig. 8.14. The heights of the bar segments are proportional to the numbers of the
movers that were present in the respective places in the time units corresponding to
the bars. The bands connect segments of consecutive bars. A band connecting dif-
ferently coloured segments represents the movers that were in the place represented
by the segment on the left in the time corresponding to the left bar and moved to the
place represented by the segment on the right by the time corresponding to the right
bar. A band connecting segments of the same colour represents the movers that were
in the place represented by this colour in both times corresponding to the left and
right bars. The widths of the bands are proportional to the numbers of the movers.
The bars in the main part of a DCDV display can represent not all time units for
which data are available but a selection of time units. Hence, two consecutive bars
may not necessarily represent two adjacent time units; the respective time units
may be separated by a time interval of any length. The user can compare arbi-
trary time units by interactively selecting them. This can be done by clicking in the
328 8  Visual Analytics Focusing on Time

Fig. 8.14  A DCDV display presents an overview of the movements of the car races visitors dur-
ing two days. The colours represent different categories of places of interest; see Fig. 8.15a

Fig. 8.15  (a) The places of interest in the area of car races have been grouped into categories,
which are represented by different colours. (b) The display shows the possible selections of time
units for gaining an overview of the movements over two days. The line marked by a red frame
corresponds to the selection of time units in Fig. 8.14

lower part of the DCDV display (Fig. 8.14c), where all time units are represented
by vertical segmented bars without connecting bands between them.
The middle part of the DCDV display (Fig. 8.14b) includes a horizontal row of
greyscale-shaded square pixels, which also represent all time units in the chron-
ological order, like the bars in the lower part. The pixels representing currently
selected time units are connected by lines to the corresponding bars in the upper,
main part of the display. The degree of darkness of the pixel shading represents the
amount of changes in each time unit with respect to the previous time unit, that is,
is proportional to the number of movers that changed their positions.
The time units to explore in the main part of the DCDV display can be selected
not only purely interactively by the user but also with the help of algorithmic
selection methods. As described by von Landesberger et al. (2012), the user can be
supported in selecting globally and focally representative time units. A selection of
8.2 Relations 329

globally representative time units enables an overview of the mainstream develop-


ments. A choice of focally representative time units is suitable for revealing unu-
sual developments and for focusing on specific aspects of the data.
Globally representative time units are selected in such a way that the number
of changes (i.e. object transitions between different places or categories) from a
time unit to the next time unit in the selection is close to constant. Figure 8.14
demonstrates a selection of 24 globally representative time units out of the avail-
able 96 time units. The display presents an overview of the collective movements
of the visitors of the car races during the two days. The colours are explained in
Fig. 8.15a; the grey colour represents unknown positions of movers, specifically,
before the times of their first recorded positions and after the times of the last
recorded positions. From the overview in Fig. 8.14a, we see that people mostly
came to the race area from unknown places in the interval 9:00–12:20 on the first
day. There were quite many transitions between different categories of places in
this interval. By 12:30 and 13:00, many people came to the places of food and
shopping (yellow colour) and then moved to the tribunes (red). We remind that
the qualifying race took place from 14:00 to 15:00. By 15:30, there were many
transitions from the tribunes to the parking places (blue) and food and shopping
places (yellow). Quite many movements between the different types of attractions
occurred also later. No time units have been selected from the period between
19:00 of day 1 and 7:30 of day 2, which means that the changes during this period
were very small. On the second day, we see many flows to the tribunes before the
main race (we remind that it took place from 13:30 to 15:30) as well as flows to
and from the food and shopping places. After the race, the visitors quite rapidly
moved from the tribunes to the other places and then out of the race area.
The algorithm for global time selection, which is described by von
Landesberger et al. (2012), generates a set of possible selections with different
numbers of selected time units. The selections are visually represented as shown
in Fig. 8.15b, allowing the user to compare them and choose one or another. The
time selection display stays on the screen, so that the user can easily switch from
one selection to another. In the selection display, each row represents one of the
possible selections found by the algorithm. On the left of a row, the number of
selected time units is shown. The remaining part of the row represents the time
line with all time units. The grey rectangular pixels represent the selected time
units. In the example in Fig. 8.15b, the row corresponding to a selection of 24 time
units is marked by a red frame, which means that it was chosen by the user (i.e. by
us). This selection is represented in Fig. 8.14.
The algorithm for selecting globally representative time units has two variants.
The variant that has been used in the example discussed above takes into account
all transitions between places and categories. The other variant finds a set of rep-
resentative time units for a selected place or category. It takes into account only
the transitions to and from this place or category. As in the first variant, the algo-
rithm tries to find such a subset of time units that the number of changes between
the consecutive selected units is close to constant. As an example, Fig. 8.16 shows
a DCDV display with a choice of 20 representative time moments for the place
330 8  Visual Analytics Focusing on Time

category tribune. It allows us to compare the time units regarding the moves of the
people to and from the tribunes. The corresponding time selection display can be
seen in Fig. 8.18b.
Focally representative time units are selected by an algorithm that tries to max-
imize the number of changes between the consecutive selected time units. Hence,
the algorithm supports uncovering intensive changes. Like the algorithm for finding
globally representative times, this algorithm makes several possible selections with
different numbers of representative time units and shows these selections to the user
for comparing and choosing (Fig. 8.18c). An example corresponding to a choice of
26 focally representative time units for the car races data is shown in Fig. 8.17. Both
the DCDV display in Fig. 8.17 and the time selection display in Fig. 8.18c show that
the most intensive changes happened in the times before and after the races on the
first and second day. We see very clearly the movements to and from the food and
shopping places at the lunch times, especially in day 1, the movements to the trib-
unes before the races, to the parking and shopping places after the qualifying race
on day 1, and out of the race area (i.e. to unknown places) after the main race on day
2. Besides, the display highlights the changes that occurred between day 1 and day
2. Between 17:00 on day 1 and 09:30 on day 2, we see many flows from different
coloured bar segments to the grey area, which represents unknown positions. These
flows correspond to the visitors who attended the race area on day 1 and left the area
after 17:00. There are also many flows from the grey area to different coloured bar
segments. These flows correspond to the visitors who did not attend the race area
on the first day but arrived to it by 9:30 on the second day. Hence, there were quite
many people who attended the races only on one of the two days.
A DCDV display allows the user to compare all time units in terms of the pres-
ence of movers in different places or categories of places. For selected time units
represented by neighbouring bars in the main part of the display, it is possible to
see what changes of movers’ positions took place between these time units. It is
also possible to select a group of movers, for example, by clicking on a bar seg-
ment or a connecting band, and observe how their positions changed over all time
units selected in the main view. In this way, relations between time units in terms
of the spatial positions of different movers can be revealed and examined.
The automated selection of representative time units is also helpful for under-
standing the relations between time units. In particular, depending on the selec-
tion algorithm applied, the user knows how the selected time units are related
among themselves and to the remaining time units. Thus, globally representative
time units are related, so that the amounts of change between consecutive units are
nearly equal. The temporal distances between the selected time units are indicative
of the intensity of the changes. Focally representative time units are related, so that
the amounts of change between consecutive units are the largest. It can be con-
cluded that the changes between the remaining time units are smaller.
The DCDV display technique allows dealing with a rather small number of
places or place categories Currently, we are not aware of any visualization tech-
nique that would be effective for studying relations between multiple spatial situa-
tions with many places of interest.
8.2 Relations 331

Fig. 8.16  A DCDV display presents an overview of the movements to and from the tribunes

Fig. 8.17  A DCDV display exhibits the most intensive changes

Fig. 8.18  The algorithms for time selection can suggest different selections of time units
depending on the user’s interests. a Globally representative time units for all place categories.
b Globally representative time units for the tribunes. c Focally representative time units for inten-
sive changes
332 8  Visual Analytics Focusing on Time

8.3 Recap

Spatial distributions are aggregated and simplified representations of spatial situ-


ations, which characterize time units in terms of the spatial positions and move-
ment characteristics of the existing moving objects. Spatial presence distributions
consist of counts of object presence in different places. Spatial flow distributions
consist of flow magnitudes, that is, counts of moves between places, or different
objects that moved, or aggregated movement characteristics such as average speed.
Spatial situations in different time units can be represented by map sequences
in animated display or small multiples; however, these techniques are ineffective
for large numbers of time units. Clustering of time units by similarity of the spa-
tial situations allows the analyst to deal with smaller number of situations rep-
resenting groups of similar situations. In clustering, the Euclidean or Manhattan
distance between the values referring to the same places or links in two distribu-
tions can be taken as the measure of dissimilarity between the situations. A repre-
sentative spatial situation for a group can be constructed by averaging the values
from the situations included in the group or by selecting the situation with the
smallest average dissimilarity to all others. Representative spatial situations can
be compared by visualizing them in small multiple maps and by computing differ-
ences between them.
A complementary method for analysing characteristics of spatial situations is
extraction of local features, such as local maxima or minima, where the value in
a place is significantly higher or lower than in its neighbourhood. Extracted local
features are represented by spatial events, which may be visualized and analysed
as an independent set of spatio-temporal objects.
Important relations between spatial situations are quantitative changes, that
is, changes in the presence counts, flow magnitudes, and values of other numeric
attributes of the places and links, and displacements (changes of the positions) of
the moving objects. Quantitative changes can be analysed with the help of change
maps. To create a change map for two spatial situations, the place- and link-related
values of the earlier situation are subtracted from the respective values of the later
situation.
Changes of object positions can be visualized on flow maps or by origin–des-
tination matrices. However, flow maps may be heavily cluttered. Besides, either
a flow map or a matrix represents changes for one pair of selected situations.
For several pairs, small multiple maps or matrices can be used; however, with
this representation, it may be difficult to trace changes over a sequence of more
than two time moments. A DCDV display can be used to explore object positions
changes over multiple selected time units when the number of different places
is small or the places can be grouped into a small number of place categories.
Computationally supported selection of globally or focally representative time
moments enables either an overview of the changes over the whole time period of
the data or detection of the most intensive changes.
References 333

References

Andrienko, G., Andrienko, N., Bremm, S., Schreck, T., von Landesberger, T., Bak, P., et al.
(2010). Space-in-Time and Time-in-Space self-organizing maps for exploring spatiotemporal
patterns. Computer Graphics Forum, 29(3), 913–922.
Andrienko, G., Andrienko, N., Stange, H., Liebig, T., & Hecker, D. (2012). Visual analytics for
understanding spatial situations from episodic movement data. Künstliche Intelligenz, 26(3),
241–251.
Bremm, S., von Landesberger, T., Andrienko, G., Andrienko N, Schreck T. (2011). Interactive
analysis of object group changes over time. In Proceedings of the International Workshop on
Visual Analytics EuroVA 2011, EuroGraphics (pp. 41–44).
Köthur, P., Sips M, Unger A, Kuhlmann, J., Dransch, D. (2013). Interactive visual summaries
for detection and assessment of spatiotemporal patterns in geospatial time series. Information
Visualization, 2013.
Lampe, OD., Kehrer, J., Hauser, H. (2010). Visual analysis of multivariate movement data
using interactive difference views. In Proceedings of the International Conference on Vision,
Modelling, And Visualization VMV (pp. 315–322).
von Landesberger, T., Bremm, S., Andrienko, N., Andrienko, G., Tekusova, M. (2012). Visual
analytics methods for categoric spatio-temporal data. In Proceedings of the IEEE Conference
on Visual Analytics Science and Technology (VAST 2012) (pp. 183–192). NY: IEEE Computer
Society Press.
Scheepens, R., Willems, N., van de Wetering, H., Andrienko, G., Andrienko, N., & van Wijk, J. J.
(2011). Composite density maps for multivariate trajectories. IEEE Transactions on
Visualization and Computer Graphics (TVCG), 17(12), 2518–2527.
Slocum, T. A., McMaster, R. B., Kessler, F. C., & Howard, H. H. (2009). Thematic cartography
and geovisualization (3rd ed.). Upper Saddle River: Pearson Prentice Hall.
Chapter 9
Discussion and Outlook

Abstract We revisit the key parts of the conceptual framework from Chap. 2 and
link them to the transformational and analytical methods from Chaps. 3–8. We put the
methods in correspondence with the types of analysis tasks. We show how the prop-
erties of available movement data can be investigated and explain their implications
for the analysis. We suggest general analytical procedures composed of different types
of tasks for gaining comprehensive knowledge from movement data. We discuss the
methods and procedures allowing detection and analysis of various kinds of rela-
tions between movement and its spatio-temporal context. We reason about specific
and general movement behaviours of individuals and collectives and argue that only
visual analytics approaches can currently support reconstruction of general movement
behaviours from movement data. Regarding the necessity to protect personal privacy
of people whose positions are contained in movement data, we outline the approaches
to privacy protection depending on the types of analysis tasks. We conclude the chap-
ter with a discussion of future perspectives and suggest several exercises to the readers.

9.1 Multi-Perspective View of Movement and Task


Typology

In Chap. 2, we have introduced a typology of movement analysis tasks, which is


based on a multi-perspective view of movement. The analytical methods and proce-
dures presented in Chaps. 5–8 correspond to the four possible analysis foci: movers,
spatial events, space, and time. According to the foci, these four groups of analytical
methods deal with trajectories of movers, spatio-temporal positions of spatial events,
dynamics of objects presence in spatial locations, and spatial situations in time units,
respectively; see Fig. 2.4. These characteristics can be represented by different types
of spatio-temporal data (Sect. 2.4 and Table 2.1): trajectory data, spatial event data,
local time series, and spatial distributions. Hence, each analytical method requires a
certain type of data representation. Chapter 3 describes transformational methods for

G. Andrienko et al., Visual Analytics of Movement, 335


DOI: 10.1007/978-3-642-37583-5_9, © Springer-Verlag Berlin Heidelberg 2013
Table 9.1  Correspondence between task foci and methods for data transformation, analysis and visualization
336

Task focus Transformation methods Analytical methods and proceduresa Visualization methods
Movers Construction of tracks from position Characteristics: Individual trajectories: map, perspective
records or spatial events, Sect. 2.9.1 Spatial summarization, Sect. 5.1.1 view for 3D trajectories, Sect. 4.1,
Interpolation and re-sampling, Sect. 3.1 Clustering of trajectories, Sect. 5.1.2 space–time cube, temporal bar chart,
Division of tracks into trajectories, Sect. Clustering of trajectory segments, Sect. 5.1.4 Sect. 4.1, trajectory wall, Sect. 5.1.3
3.2 Query-based extraction of movement events, Summarized trajectories: flow map,
Space and time transformations, Sect. 3.3 Sect. 3.5.1 Sect. 5.1.1
Derivation of thematic attributes Sect. 3.4 Relations:
Generalization Sect. 3.6 Extraction of relation occurrences, Sect. 5.2.1,
Simplification Sect. 3.7 Sect. 5.2.3
Group movement analysis, Sect. 5.2.2
Analysis of event impact, Sect. 5.2.3
Spatial events Extraction of spatial events from trajecto- Characteristics: Individual events: map, space-time cube
ries, Sect. 3.5.1, local time series, Sect. Clustering of spatial events, Sect. 6.1 Groups of co-located events: growth ring
7.2.5, spatial situations, Sect. 8.1.2 Summarization of event groups Sect. 6.2.3 map, Sect. 6.2.1, flower diagrams,
Extraction of relation occurrences, Sects. Relations: Sect. 6.2.2
5.2.1, 5.2.3 Query-based extraction of relation occurrences, Event clusters: spatio-temporal convex
Extraction of composite events (event Sect. 6.3.1 hulls or buffers, Sects. 4.2.4, 6.14
clusters), Sect. 6.1 Analysis of event impact, Sects. 5.2.3, 6.3.2 Aggregate characteristics of composite
Query-based extraction of relation occur- Discovery of related events by analytical reason- events: diagram map, Sect. 5.2.3, text
rences, Sect. 6.3.1 ing, Sect. 6.3.2 cloud, Sect. 6.2.3
(continued)
9  Discussion and Outlook
Table 9.1  (continued)
Task focus Transformation methods Analytical methods and proceduresa Visualization methods
Space Space transformations, Sect. 3.3 Characteristics: Places with attributes: diagram map
Spatial generalization, Sect. 3.6 Time series transformations, Sect. 7.2.2 Time series: time graph, Sect. 7.2.1
Spatio-temporal aggregation, Sect. 3.8 Time series clustering, Sect. 7.2.3 Summarized time series: quantile graph and
Extraction of places of interest from Time series modelling, Sect. 7.2.4 temporal histogram, Sect. 7.2.1
movement data 7.1 Event extraction from time series, Sect. 7.2.5 Flows between places: flow map; droplet
Relations: map and DCDV, Sect. 7.3.3
Analysis of binary links and flows, Sect. 7.3.1 Place sequences: trajectory map, space–
Analysis of dependencies between link character- time cube, text cloud, Sect. 7.3.4
istics, Sect. 7.3.2
Visual analysis of sequences of place visits,
Sect. 7.3.3
Discovery of frequent sequences, Sect. 7.3.4
Time Time transformations 3.3 Characteristics: Situation sequence: animated map, small
Spatio-temporal aggregation 3.8 Clustering of spatial situations, Sect. 8.1.1 multiple maps
Event extraction from spatial situations, Representative situations for clusters: small
Sect. 8.1.2 multiple maps, Sect. 8.1.1
Relations: Numeric changes: change maps, Sect. 8.2
Numeric change analysis, Sect. 8.2 Position changes: DCDV, Sect. 8.2
9.1  Multi-Perspective View of Movement and Task Typology

Position change analysis, Sect. 8.2


aThe analytical methods and procedures are subdivided by task targets: characteristics or relations
337
338 9  Discussion and Outlook

converting movement data from one representation to another. In this way, different
perspectives of movement and different task foci are supported and the application
of suitable methods is enabled. There are also analytical methods in which spatial
events are extracted from different types of spatio-temporal data (trajectories, spatio-
temporal positions of spatial events, local time series, and spatial distributions) as a
part of the analysis. In fact, spatial events are a universal form of representing fea-
tures of interest in different spatio-temporal data for convenient investigation.
As stated in Sect. 2.12, visual analytics is mainly concerned with synoptic anal-
ysis tasks, that is, tasks that deal with sets of entities (movers, trajectories, events,
locations, or times) rather than individual entities and require generalization and
abstraction over these sets. All methods in Chaps. 5–8 are intended for synop-
tic tasks. The use of these methods relies on the basic infrastructure presented in
Chap. 4, which includes techniques for visual representation of different data types
(trajectories, events, local time series, and presence situations) and techniques for
user interaction with displays and represented data and for data search and selec-
tion. These basic techniques enable both general views of sets and access to the
elements of these sets thereby supporting also elementary tasks.
Table  9.1 connects the types of synoptic tasks to the supporting methods,
including transformational methods, analytical methods and procedures relying on
algorithms or computations, and visualization methods. All methods are grouped
by the task foci; the analytical methods and procedures are further subdivided by
task targets, that is, characteristics or relations.
It can be noticed that methods extracting spatial events from various data are
included both in the column with the transformational methods and in the column
with the analytical methods. This is because these methods, on the one hand, trans-
form data from one form to another and, on the other hand, serve as means for ana-
lysing the data from which the events are extracted, as has been explained earlier.
Table  9.1 allows finding suitable methods for particular synoptic tasks after
determining their foci and targets. However, movement analysis may be not lim-
ited to one or few particular tasks. Analysts may need to investigate movement
data in a comprehensive way, from different perspectives, to gain a possibly full
understanding of the phenomenon. In such cases, different types of analysis tasks
need to be performed. In one of the following sections, we suggest possible work-
flows composed of tasks of different types.
Irrespective of whether the intended analysis is comprehensive or limited to one
or few particular tasks, it is necessary, before starting the analysis, to examine the
properties of the available data and assess their fitness to the purposes.

9.2 Properties of Movement Data

In Sect. 2.9.2, we have listed a number of properties of movement data that need
to be taken into account in analysing movement. Real data do not always come
together with an exhaustive description of their properties. Therefore, it is usu-
ally necessary to examine data properties before beginning any analysis. Here,
9.2  Properties of Movement Data 339

we recommend some methods that can be used for this purpose and indicate the
implications of particular properties for the analysis.

9.2.1 Temporal Properties

Temporal resolution and regularity of trajectory data can be learned from the statis-
tical distribution of the lengths of the time intervals between the position records in
trajectories. The distribution can be visualized in a frequency histogram, which can
be built using standard database operations. For temporally regular data, the histo-
gram will have one very high bar corresponding to the length of the constant time
step between the records. The histogram will also contain several small bars, since
even temporally regular data may have between-record time intervals of non-stand-
ard lengths, which may correspond to missed positions or to absence of changes.
Statistical distributions can also be explored by means of cumulative frequency
curves; the use of this kind of graph is described in detail by Andrienko and
Andrienko (2006). A vertical segment of a frequency curve will correspond to the
standard value of the time step between the records. Thus, the curve on the top
of Fig. 9.1 shows the distribution of the lengths of the time intervals between the

Fig. 9.1  Cumulative frequency curves show the statistical distributions of the lengths of the
between-record time intervals in trajectory data. Top Milan cars data; bottom North Sea vessels data
340 9  Discussion and Outlook

records in the Milan cars data (Sect. 2.10.2). The nearly vertical segment corre-
sponds to the values from 30 to 35 s. The statistics below the curve says that these
values occur in 67.3 % of the records of the dataset. Hence, the data can be con-
sidered as mostly temporally regular; however, longer time gaps of various lengths
occur in 28.4 % of the records. This needs to be taken into account in comput-
ing derived attributes, such as speeds. A possible approach is to ignore the records
where the time interval to the next record is much longer than 35 s.
The cumulative curve at the bottom of Fig. 9.1 has three near-vertical
segments, that is, there are three most frequent lengths of time steps between
records. This curve corresponds to the North Sea vessels data (Sect. 2.10.3). The
most frequent time steps are around 9 min (549–544 s), around 10 min (592–
608 s), and around 12 min (718–723 s). These values occur in 22.2, 47.0, and
10.6 % of the records, respectively, which makes 79.8 % in total. The presence of
three frequent time steps may be explained by the use of positioning devices with
different settings.
The typical lengths of time steps between position records need to be taken
into account in computing and using derived movement attributes, in setting tem-
poral distance thresholds (e.g. in spatio-temporal clustering of movement events
extracted from trajectories; see Sect. 6.1), in data aggregation by time intervals, in
detecting stop events, and in other operations dealing with the temporal compo-
nent of the data in a explicit or implicit ways.
For data with irregular time intervals between records, the statistical distribu-
tion of the interval lengths may be, in principle, arbitrary. As a rule, such distri-
butions have “long tails”, that is, there are quite many occurrences of very high
values while most of the values are relatively very low.
The range of most frequent time step lengths for temporally regular data or the
range of lengths including the bulk of the data for temporally irregular data can
be treated as the temporal resolution of the data. Depending on the upper bound
of the length range and the characteristic velocities of the movers, the data can be
considered as either quasi-continuous or episodic. Data should be treated as epi-
sodic if the movers can potentially travel long distances in time intervals of the
lengths that frequently occur in the data. What distances need to be considered as
long, depends on the application domain properties and intended spatial scale of
the analysis. Thus, the distances that can be travelled by vessels in time intervals
9–12 min can be viewed as relatively short for analysing the maritime traffic at
a large spatial scale but they may be too long for analysing manoeuvres of ships
inside a harbour.
As stated in Sect. 2.9.2, quasi-continuous trajectory data allow interpolation
between known positions and episodic movement data do not allow interpolation.
Episodic movement data are usually unsuitable for computation of derived move-
ment attributes such as speed, direction, acceleration, path length. Section 3.8
discusses the aggregation of episodic movement data, which may result in flows
connecting distant places. Episodic movement data do not adequately represent the
paths of the movers; therefore, clustering of trajectories by route similarity (Sect.
5.1.2.2) usually does not work for such data. For detecting encounters between
9.2  Properties of Movement Data 341

movers, the method involving interpolation (Sect. 5.2.1) cannot be used. The
method suggested in Sect. 5.2.3 can be used, but the analyst should understand
that many real encounters may remain undetected due to the temporal sparseness
of the data. Despite all issues with episodic movement, there are still many pos-
sibilities for analysis, as demonstrated in this book by examples of the Flickr data,
animal data, and car races attendance data. The specifics of episodic movement
data and approaches to their analysis are discussed in a paper by Andrienko et al.
(2012b).
The overall temporal coverage of a dataset can be explored by means of a his-
togram or cumulative frequency curve representing the distribution of the time
references in the data. This will show the time period of the data and how the
measurements are distributed within it. The coverage of time cycles can be exam-
ined using histograms or cumulative curves of the time cycle positions of the
records, that is, months or days of a year, days or hours of a week, or times of
a day. Temporal gaps in the measurements, both for the linear time and for time
cycles, will be manifested by absence of bars or very low bars in a histogram and
by horizontal segments in a cumulative frequency curve. Temporal gaps should be
distinguished from natural decreases in measurement frequencies reflecting the
variation of movement depending on time cycles.
Besides global time gaps, data may contain local time gaps occurring in some
places or parts of the territory. Such time gaps may be undetectable from the over-
all statistical distribution. Therefore, it is reasonable to divide the territory into
parts (e.g. by means of space tessellation, as described in Sect. 7.1.1) and visual-
ize the temporal distributions of the records in the parts. In the case of positions
recorded in a finite number of locations (e.g. locations of Bluetooth or RFID sen-
sors or antennas of a mobile phone network), the temporal distributions of the
records in these locations can be visualized and explored. An example of data
with local gaps in temporal coverage is demonstrated in Fig. 9.2. These are data
about mobile phone use. The mosaic diagrams represent the counts of recorded
mobile phone activations at different antennas by days. Each day is represented
by a coloured pixel. The pixels within a diagram are arranged in rows so that the
columns correspond to days of the week, from Monday to Sunday. The time span
of the data is 20 weeks; accordingly, there are 20 rows in each diagram. The dark
blue pixel colour corresponds to zero counts, that is, absence of phone activation
records. It can be seen that on the south the data are available only for the last
two weeks. On the west of the shown territory, there are several consecutive weeks
with missing data. Time gaps are also noticeable for some antennas on the north-
east. In the centre of the territory, there is an antenna (highlighted) with no records
on the weekend.
The presence of global or local gaps in temporal coverage can greatly affect
the suitability of movement data for analysis. In the case of time gaps, reconstruc-
tion of movers’ trajectories and computation of local presence dynamics, flows
between places, and spatial distributions do not produce valid results. In fact, it is
reasonable to analyse only the data subsets for the parts of the territory and time
intervals that do not contain time gaps.
342 9  Discussion and Outlook

Fig. 9.2  In mosaic diagrams, dark blue pixels represent days with no measurements

An important temporal property of movement data is how the time references


are specified: whether the times in the data records are local times or universal
(global) times and, in the case of local times, whether the time references include
seasonal adjustments to the daylight saving time. A complicated case is when
data have been collected during a time period that contains one or more moments
of seasonal change of clock times. To recognize whether time references in data
reflect seasonal adjustments, it can be recommended to build histograms of the
number of records by hourly intervals around the dates of time change. Ideally,
advancing the clock in spring by one hour forward will be manifested by absence
of a bar for the conventional hour of the clock shift (e.g. at 3 AM) and the back-
ward adjustment in autumn by an unusually high bar for this hour. However, in
many cases, the frequencies of data records in the night times significantly
decrease, and the variations caused by time shifts may be unnoticeable. When data
reflect variations of movement according to daily cycles, time adjustments will be
manifested by shifts of the usual times of daily peaks and pits of presence and flow
magnitudes. In the case of detecting time adjustments, the time references in the
data need to be standardized before staring the analysis.
In Flickr data, there is a specific problem that the time references of the data,
that is, the times at which the photos were taken are based on the time settings of
the photo cameras. Apart from the cases when the times in cameras are not set at
9.2  Properties of Movement Data 343

all (these cases are easy to recognize since the times of the photos lie far beyond
the expected time range), people may set their cameras to local times of their
home places and not always shift the settings when they travel to other time zones.
Therefore, photos taken in places distant from home may have wrong time ref-
erences. Unfortunately, there is no reliable way to recognize such cases in data.
Therefore, it may be reasonable to avoid analysing Flickr data at fine temporal scales.

9.2.2 Spatial Properties

The spatial resolution of trajectory data can be examined analogously to the tem-
poral resolution, that is, by looking at the statistical distribution of the spatial dis-
tances between the position records. The spatial resolution needs to be taken into
account in setting spatial thresholds for event detection and clustering and choos-
ing a suitable cell size for territory tessellation. In the case of low spatial resolution
(i.e. high frequency of relatively long distances between position records), values
of derived movement attributes such as speed and direction may be unreliable.
The spatial precision of data is sometimes clear from the data themselves. Thus,
positions of mobile phone activations may be specified in an implicit way by the
identifiers of the antennas that registered the activations while the coordinates of
the antennas may be provided in a separate dataset. Data collected by static sen-
sors, such as Bluetooth or RFID, may also have this structure. The analyst should
be aware that the spatial precision of mobile phone data and data collected by
static sensors is determined by the ranges of the antennas and sensors.
In a case when positions are specified by coordinates, it is advisable to repre-
sent the recorded positions on a map by point symbols. If the data volume is very
large, a manageable random sample can be taken. If all points on the map fall in
a relatively small number of distinct locations, the spatial precision of the data is,
most probably, areas around these locations. The sizes of the areas can be esti-
mated by finding out the typical ranges of the devices used for the measurements.
For mobile phone data, the sizes of the areas can be judged from the distances
between the antennas. The sizes of Voronoi cells built around the antenna positions
give an approximation of the spatial precision in different parts of a territory.
Data where the positions are areas rather than points are not suitable for com-
puting derived movement attributes. The maximal possible size of an area needs to
be taken into account in choosing spatial thresholds and cell sizes for spatial tessel-
lations. If (some of) the areas are large, encounter detection will not give trustable
results. The spatial precision needs to be taken into account in analysing movers’
routes and in extracting movement events, such as stops, and personal places.
Spatial coverage of movement data is usually clear from a map of recorded
positions or trajectories. However, as demonstrated in Fig. 9.2, in some parts of
the underlying territory, there may be time intervals for which there are no meas-
urements. The way to detect such cases and the implications for the analysis are
described in the previous section.
344 9  Discussion and Outlook

9.2.3 Mover Set and Mover Identity Properties

The number of movers that are represented in a dataset can be learned from the
count of distinct mover identifiers. However, the identifiers given to the movers are
not necessarily preserved throughout the whole time of data collection. For pre-
serving personal privacy or other reasons (e.g. hiding sensitive repeated patterns in
business-related movement data), movers may be assigned new identifiers at cer-
tain time intervals, for example, every day or every two weeks. The previous iden-
tifiers may be reused for different movers. Moreover, in different time intervals,
the data may correspond to different samples of the population.
The policy regarding the mover identifiers is typically known from data provid-
ers. If it is occasionally not known, it can be discovered by exploring the trajecto-
ries constructed from the records with coinciding mover identifiers. Assignments
of previously used identifiers to new movers will be manifested by unusually long
jumps in space occurring in many trajectories at regular time intervals. An exam-
ple is shown in a space–time cube in Fig. 9.3, where a sample of trajectories is
drawn with 1 % opacity. Bunches of long near-horizontal lines occur every two
weeks. It is very unlikely that they reflect real movements; most probably, they
reflect reassignments of mover identifiers. A case when old identifiers are not
reused but new unique identifiers are assigned to movers can be recognized from
the statistics of trajectory durations. The maximal duration will not exceed the
length of the time interval between the changes of the identifiers.

Fig. 9.3  Long “jumps” in multiple trajectories indicate reassignments of mover identifiers


9.2  Properties of Movement Data 345

When identifiers of the movers are not constant throughout the whole time, the
number of different movers can be estimated from the maximal count of distinct
identifiers among all time intervals of identifier constancy; however, this may not
be a good estimate if a different population sample was used in each interval.
The limited times of preserving the associations between movers and identifiers
have important implications for the analysis. First of all, trajectories of the movers
need to be constructed only within the time intervals of identity preservation. If
old identifiers are reused, it is reasonable to transform them to unique identifiers,
to avoid misleading results that can be caused by occasional grouping of positions
and movements of several movers. Individual movement behaviours can be ana-
lysed only within the time intervals of identity preservation. In the case of differ-
ent samples of movers represented in different time intervals, differences between
the time intervals can be expected in aggregated data: local presence dynamics,
flow dynamics, and spatial distributions of the presence and flows. If the differ-
ences are small and the major patterns preserved, the sampling can be judged as
very good.
It is a very common case that available movement data do not cover all mov-
ers of interest from a given territory. Thus, the Milan cars dataset does not include
data about all cars that moved in Milan during the week for which we have the
data, and the dataset about the wild animals does not cover all roe deer and
lynxes in the forest. The population coverage of a dataset needs to be assessed
with regard to the goals and scope of the intended analysis. For example, the cars
dataset may contain data about all customers of a car insurance company. If the
company wishes to analyse the individual driving behaviours of its customers, the
mover population of interest consists only of the customers and is fully covered by
the data. If the data are to be used for analysing the overall traffic in the city, the
mover population of interest includes all vehicles moving in the city; hence, the
dataset covers only a part of the population. The same refers to mobile phone use
data collected by telecommunication companies. Even when a dataset covers all
customers of a company rather than a sample, it des not cover the whole popula-
tion of mobile phone users and all the more so all people that appear and move on
a given territory.
When available data cover a subset of a mover population of interest, the ques-
tion arises about the representativeness of this subset for the whole population,
that is, whether the subset has the same distribution of movement-relevant proper-
ties as in the entire population. This is a difficult question that cannot be answered
by data exploration but only by reasoning based on knowing the data collection
method and the general properties of the movement phenomenon that needs to be
analysed. Thus, let us suppose that we need to analyse car traffic in Milan and
have only trajectories of customers of a particular car insurance company. We
know that the movement of cars is constrained by the street network and traffic
regulations; therefore, any particular car moves similarly to other cars around it.
Therefore, if the available dataset is big enough to represent movements in differ-
ent parts of the city and on all relevant streets, we can conclude that the sample of
movers in this dataset is representative for our purposes.
346 9  Discussion and Outlook

Let us now take the examples of the Flickr and Twitter datasets. The Flickr
dataset represents movements of the Flickr users. These are, evidently, people
who have photo cameras, like to make photos, and like to share their photos with
others. These properties can have an impact on the movement behaviours of the
Flickr users, which may therefore differ from the movement behaviours of gen-
eral people. In particular, Flickr users, probably, like to go to places where they
can take photos of interesting objects or events or beautiful scenery. Therefore,
the set of Flickr users cannot be considered as representative of all people that
are present on a given territory. However, we know that tourists that come some-
where for sightseeing often have photo cameras and are also interested in going
to places with interesting objects, events, or beautiful scenery. Therefore, if we
manage to separate travelling Flickr users from those who only take photos
around one place, we can consider this subset as representative for the population
of tourists.
For the Twitter dataset, we know that young people tend to use Twitter more
actively than older people. The subset of movers in this case is not representative
of the whole population of people. Perhaps, it can be representative for the young
population; however, active use of Twitter may be related to certain psychologi-
cal traits, which would make the dataset biased towards individuals with particular
properties.
In analysing a dataset where the set of movers is not representative of the popu-
lation of interest, it is important to avoid over-generalization in interpreting dis-
covered patterns. It would be good, if possible, to do comparative analysis of two
or more datasets covering different subsets of the population. For example, move-
ments of Twitter users could be compared with movements of mobile phone users
on the same territory.
When the set of movers in a dataset is judged as representative of the popula-
tion of interest but not covering the whole population, it is important to know what
proportion of the population is covered. If this is not stated by the data provider
and is not clear from the data collection method, it is necessary to find information
about the size of the entire population or a method to estimate population the size.
For example, we wanted to estimate the relative size of the set of cars covered by
the Milan data with respect to the whole set of cars in the city. We got an idea that
this could be done if we knew the capacities of the most heavily used streets in
Milan. By dividing the counts of cars from our sample that moved on these streets
by the street capacities, we would estimate the proportion of the sample in the
population. We found relevant information in traffic management textbooks and
managed to find even two approaches to the estimation.
One approach uses the standard capacity of a city motorway for the average
speed of 90 km/h: it is 4,000 cars per hour in two directions and, hence, 2,000 cars
per hour in each direction. The maximal flow on a motorway segment at the aver-
age speed of about 90 km/h derived from our dataset is 61 cars per hour. However,
this may be an outlier, and we take therefore the 90th percentile, which is 40, that
is, 2 % of the 2,000 cars. Hence, the estimated relative size of our sample of cars is
2 % of the total population of cars in Milan.
9.2  Properties of Movement Data 347

The other approach is based on the formula describing the relationship between
the traffic intensity (q), density (k), and mean speed (u) when the traffic flow is in a
stationary and homogeneous state: q = k · u. The traffic intensity q is the number of
vehicles passing a cross section of a road in a unit of time. In our aggregated Milan
data, it is represented by the hourly count of moves per link. The traffic density k is
the number of vehicles present on a unit of road length at a given moment. The max-
imal possible traffic density depends on the distances between the vehicles, which,
in turn, depend on the current speed. The minimal safety distance between vehicles
following one another is specified in time units: it is the time in which the following
vehicle with its current speed can travel the distance separating it from the preceding
vehicle. Different sources recommend the minimal safety distance from 2 to 3 s. For
the speed 90 km/h (i.e. 25 m/s), the minimal safety distance in metres is from 50 to
75. The average length of a car, which could be found in the web, is 4.12 m. For the
average speed of 90 km/h, one car occupies at least 54.12 or 79.12 m of a road lane.
Hence, the maximal traffic density can be about 18.5 cars per one km (1,000 m) with
2-s safety distance or about 12.6 cars per one km with 3-s safety distance. According
to the formula q = k · u, the maximal traffic intensity for the average speed 90 km/h
is from 1137.5 to 1,665 cars per hour in one lane, that is, from 2,275 to 3,330 cars
per hour in two lanes of a motorway. However, these figures need to be treated as
the upper bounds of the traffic intensity, since they involve the assumption that the
traffic flow is stationary and homogeneous, that is, all vehicles are moving with the
same steady speed, which rarely occurs in real traffic. Another assumption involved
is that two lanes have equal traffic intensities, which is also rarely observed in real-
ity. If we take the smaller value 2,275 cars per hour as the maximal traffic intensity
in one direction that can be reached on two lanes of a motorway for the average
speed 90 km/h, our 40 cars per hour (90th percentile) would be about 1.76 % of it,
which is close to 2 % obtained with the first approach.
Hence, to get estimates of the real traffic in Milan, we need to multiply the
hourly presence counts and flow magnitudes obtained from our data by the scale
factor 56.9 (according to the second approach). It would be appropriate to use
the scaled values in the reconstruction of the dependencies between the average
speeds and flow magnitudes in Sect. 7.3.2.

9.2.4 Data Collection Properties

In Sect. 2.9.2, the following data collection properties have been listed: position
exactness, positioning accuracy, missing positions, and the meaning of the position
absence. Position exactness is strongly related to spatial precision that has been
discussed in Sect. 9.2.2. Indeed, when object positions are measured by movement
sensors that can only detect objects in their ranges but not determine their exact
coordinates of the objects, the positions of the objects should be treated as areas
around the sensors rather than as points in space. Hence, the spatial precision of
the data is low; the implications for the analysis are listed in Sect. 9.2.2.
348 9  Discussion and Outlook

It is commonly known that any real dataset contains errors. In trajectory data,
positioning errors can be expected. In some cases, positioning errors are easy to
detect visually when trajectories are drawn on a map. This particularly refers to
trajectories constrained by roads: the positions that are not on roads are immedi-
ately noticeable. Also positions lying far apart of other positions can be noticed.
However, it is not enough to detect the presence of errors; it is necessary to clean
the data, that is, remove the errors. The paper by Parent et al. (2013) discusses the
types of positioning errors that can occur, in particular, in GPS tracks, and gives an
overview of the methods to remove or reduce the errors. For network-constrained
movement, an additional possibility for error reduction is provided by map-match-
ing methods (e.g. Quddus et al. 2007), which replace positions falling out of the
network by suitable positions in the network. The current map-matching methods
are briefly discussed by Parent et al. (2013).
Due to the inevitable presence of position errors in movement data, it may be
not reasonable to rely in analysis on values of instant movement attributes that are
derived from two consecutive trajectory points, such as instant speed and direction. It
may be more appropriate to use average values derived from longer point sequences.
One of the typical errors in GPS data is false movement. When a mover stays
still while the GPS device continues position recording, the coordinates of the
recorded positions are not the same but distributed in space. This needs to be taken
into account in detecting and extracting stops and separating stops from move-
ment. The methods for stop detection have been discussed in Sect. 3.5.2. False
movement needs to be filtered out from the data before computing speeds, direc-
tions, and other derived attributes.
Specific cases of false movement occur in mobile phone use data. When a
phone is in the range of two or more neighbouring antennas, it can switch from
one antenna to another without any movement of the phone carrier. When trajec-
tories of phone users are constructed based on the positions of the antennas that
served the phone use, the trajectories may contain jumps between positions of dif-
ferent antennas that do not represent real movement of the user; in Fig. 9.4, an

Fig. 9.4  A trajectory of a mobile phone user contains segments of false movement caused by
switching the phone connection between neighbouring antennas
9.2  Properties of Movement Data 349

example of such trajectory is shown on a map (left) and in a space–time cube


(right). Some of the jumps are characterized by unrealistic speeds and are there-
fore easy to detect. However, the speeds may appear normal when the neighbour-
ing antennas are close in space and/or the jumps occur not very often. In such
cases, discovery of frequent sequences (as in Sect. 7.3.4) can allow finding groups
of antennas suspicious for producing false movements. The analyst would need to
inspect these groups and the corresponding trajectories for making a final judge-
ment. If the analyst concludes that a group of antennas, indeed, generates false
movement, the trajectories can be “repaired” by substituting the sequences con-
sisting of the antennas from this group by a representative position, which may be
the central position in the group or the position of the most frequently occurring
antenna.
We have also encountered errors in trajectory data caused by occasionally
duplicated identifiers of movers. Trajectories of movers are constructed by unit-
ing consecutive positions of each mover. The movers are supposed to have unique
identifiers. The positions of different movers are distinguished based on the iden-
tifiers contained in the position records. If two distinct movers have the same
identifier, their positions will be mixed in one trajectory. This is especially well
noticeable when the movers move simultaneously. An example is demonstrated
in Fig. 9.5. The cases of occasionally duplicated mover identifiers can be recog-
nized from unrealistic derived speeds and/or unusually long spatial gaps between
consecutive positions. However, when positions of different movers with the same
identifier are separated in time, the duplication of the identifier may be not easy to
detect.

Fig. 9.5  Positions of two distinct movers with the same identifier have been mixed in one trajec-
tory, which therefore has a zigzag shape
350 9  Discussion and Outlook

In Sect. 9.2.1, we have said that there may be gaps in the spatio-temporal
coverage of a dataset, that is, cases of total absence of data for parts of a terri-
tory and/or time intervals. Such cases are relatively easy to detect. There may
be also cases when some positions are missing in different trajectories, differ-
ent times, and different places. Such cases can be manifested by unusually long
spatial and temporal gaps inside trajectories. These gaps can be caused by cir-
cumstances in which positions could not be measured, for example, in tunnels
or inside buildings for GPS-based tracking devices. Gaps may also be caused
by the way in which a subset of data covering a territory that needs to be ana-
lysed is extracted from a dataset covering larger territory. For example, it can
be noticed in Fig. 1.21 (Sect. 1.3) that the trajectories of the Milan cars cover a
rectangular area. To extract this dataset from a larger dataset, the data provider
just removed all positions that lied beyond the rectangle. This lead to specific
cases of data absence when cars temporarily moved out of the area enclosed
by the rectangle and returned back after few minutes. When trajectories are
represented on a map by lines, the trajectory lines of these cars have straight
segments connecting the last position before leaving the rectangular area and
the first position after returning back. These cases are especially striking on the
north of Milan (Fig. 9.6). Evidently, many trajectories followed the road A52,
which turned out to be outside of the selected rectangular region. The parts of
the trajectories lying on this road are missing due to the way in which the data
have been cut. In Fig. 9.6, only the affected trajectories are shown; the segments
of the trajectories are coloured according to the spatial distance between con-
secutive points. The spatial gaps caused by removing the trajectory parts on A52
range from 6 to 9.25 km.
Spatio-temporal gaps in trajectories need to be accounted for in computing
derived attributes and in data aggregation, in particular, in computing flow magni-
tudes and average speeds. A suitable strategy is to ignore atypically long trajectory
segments. This can be done based on a user-chosen distance threshold.

Fig. 9.6  On the north of Milan, there are many cases of missing positions in trajectories due to
cutting of the data by a bounding rectangle
9.2  Properties of Movement Data 351

Absence of positions at the beginning or end of a trajectory may be hard or


impossible to detect. It may be known from a data provider when and where some
positions are or may be missing. Thus, for the personal driving dataset, we know
that the initial positions of many trips are missing because the measuring device
could not determine the positions immediately when the car started moving but
took some time (that varied from case to case) to warm up and connect to satel-
lites. This is demonstrated in Fig. 9.7, where two groups of trips starting from the
work place of the person are represented by trajectory lines in blue and red. The
yellow dot symbols represent the recorded positions of the trip starts. Although all
trips actually started in about the same place, many trajectories begin in different
places. We also encountered datasets with data about people where all positions
close to the homes of the people were removed for preserving personal privacy.
When trajectories of each person are visualized on a map separately from others,
the “white spots” resulting from the position removal can be noticed.
Some of the methods that we have developed are able to deal with incomplete
trajectories. In particular, the “route similarity” distance function is quite tolerant
to missing parts of trajectories. In Fig. 9.7, the blue and red colours correspond to
two clusters of trajectories obtained with the “route similarity” function; hence,
the function could group the trajectories with missing starting parts together with
more complete trajectories following the same routes.
In this section, we have shared our experience on detecting and handling vari-
ous problems that may occur in movement data. Although we cannot guarantee
exhaustiveness of this material, we believe that it can be helpful to people that
need to analyse real movement data.

Fig. 9.7  In the personal driving dataset, the starting parts of many trajectories are missing. The
yellow dot symbols show the recorded positions of the trip starts
352 9  Discussion and Outlook

9.3 General Procedures of Movement Analysis

For the most comprehensive analysis of movement data, the analyst would look at
the data from all perspectives: mover-oriented, event-oriented, space-oriented, and
time-oriented. Such an analysis would include the following groups of tasks:

• Mover-oriented tasks dealing with trajectories of movers:


– Characterize trajectories as units in terms of their positions in space and time,
shapes, and other overall characteristics.
– Analyse the variation of the positional attributes in space and time.
– Discover and investigate occurrences of various types of relations between
the movers and the spatio-temporal context, including other movers.
• Event-oriented tasks dealing with relevant spatial events, in particular, events
that have been extracted from trajectories, local presence dynamics, or spatial
situations in the process of the analysis:
– Characterize the relevant events in terms of their spatio-temporal positions
and thematic attributes.
– Discover and investigate occurrences of various types of relations between
the events and the spatio-temporal context, including other events.
• Space-oriented tasks dealing with a set of places of interest (POI) and local
dynamics (temporal variations) of presence and flows:
– Define a set of relevant POI.
– Characterize the POI in terms of the local presence dynamics.
– Characterize binary links between the POI in terms of the flow dynamics.
– Discover and investigate temporal and ordering relations between the POI.
• Time-oriented tasks dealing with a set of time units and respective spatial
situations:
– Characterize the time units in terms of the spatial situations.
– Discover and characterize the relations between the time units imposed by
movers and/or events, in particular, similarity and change relations.

This list of tasks is not meant to specify any order in which the tasks should
be performed. During the process of analysis, tasks of different types intermix;
however, they do not intermix fully arbitrarily but follow one another in certain
logical sequences. For example, in analysing the spatio-temporal variation of
positional attributes in trajectories, the analyst may need to focus on particular
attribute values or value intervals. This can be done by extracting the trajectory
segments with these values as movement events. A logical next step is to analyse
the spatio-temporal positions of the extracted events, which means switching from
the mover-oriented to event-oriented perspective. After analysing the events, the
analyst may return to the trajectories and, possibly, extract other events. Then, it
may be reasonable to analyse the relations between the first and second sets of
events. Another possible step after extracting and analysing movement events is to
9.3  General Procedures of Movement Analysis 353

find and outline places where events frequently occur and then analyse event and/
or presence dynamics in these places. These are space-oriented tasks.
It can be noticed in this example that the possible next step in analysis depends
on the outcomes of the preceding step(s). Thus, after extracting movement events
from trajectories, analysis of the events may take place. However, there are also
possibilities to choose what to do next. Thus, after extracting and investigating
movement events, the analyst may either continue analysing the trajectories or
focus on finding and analysing the places where the events frequently occur.
It is not necessary that all types of tasks are included in an analysis. Only a sub-
set of tasks may be relevant to the analysis goals.
Based on our experience and the existing dependencies between the analytical
methods in terms of their inputs and outputs, we can suggest a number of pos-
sible rational sequences of tasks in movement analysis. These task sequences are
presented in Fig. 9.8 in the form of flow chart. The tasks are represented by brief
descriptions preceded by characters M, E, S, or T, which denote the possible task
foci: Movers, Events, Space, and Time.
Although the graph specifying the possible task sequences has a single root
node, it does not mean that any analysis must begin with the task “Analyse trajec-
tories as units” represented by this node. For a particular application, the charac-
teristics of trajectories as units may be of no interest but analysts may be interested
first of all in the positional attributes or in relations of movers to the context or in
aggregated movement characteristics over a given territory. Furthermore, the anal-
ysis may initially focus on spatial events, in particular, when the movement data
are originally available in the form of spatial events rather than trajectories, as, for
example, data from Flickr or Twitter or data about mobile phone use. In the flow
chart, the nodes where the analysis can start are marked by grey background.
It is also not necessary that the analysis ends only when one of the terminal
nodes is reached and the respective task fulfilled. The analysis may end in any
intermediate node when the application-relevant analysis goals are achieved. The
analysis may also continue by switching to another branch. In particular, there are
two terminal nodes labelled “M: Analyse trajectories responsible for the discov-
ered relations” (where relations between POI or time units are meant). Here, it
is assumed that a subset of trajectories is selected for which the analysis is done
starting from the root node of the flowchart and following the left branch.
Hence, there is no unique analysis procedure that needs to be followed in all
cases but there are many possible procedures, where the steps are chosen depend-
ing on the application-specific analysis goals and ordered according to the depend-
encies between the inputs and outputs of the analysis methods. Nevertheless, the
possible paths through the flow chart in Fig. 9.8 specify a set of generic analytical
procedures that can be useful in multiple applications. One of the possible proce-
dures is described in more detail by Andrienko et al. (2011b): movement events
are extracted from trajectories; on their basis, a set of POI is defined; the events
and/or trajectories are aggregated by these POI and time units; then, the dynamic
characteristics of the places and links between them are investigated. The work of
the procedure is demonstrated by two application examples.
354 9  Discussion and Outlook

M: Analyse trajectories as units

M: Analyse positional attributes

M: Extract movement events

E: Analyze movement events

M: Analyse relations to context

M: Extract relation events S: Define POI


based on trajectories
(space tessellation)
E: Analyze relation events

S: Define POI based on events M: Analyze event impact

E,M: Aggregate events or


trajectories by POI and time units

S: Analyse dynamics of events or


mover presence in POI
M: Aggregate trajectories
by POI and time units
S: Analyse relations between POI

S: Analyse dynamics T: Analyse


of presence and flows spatial situations

S: Analyse temporal S: Extract peak/pit T: Extract peak/pit T: Analyse change


and ordering relations events from presence events from relations between
between POI or flow dynamics spatial situations time units

M: Analyse trajectories M: Analyse trajectories


responsible for the E: Analyse peak/pit events responsible for the
discovered relations discovered relations

M: Analyze involved trajectories

M: Analyze event impact

Fig. 9.8  The flow chart represents possible sequences of tasks in movement analysis

9.4 Movement in Context

As explained in Sect. 2.7, movement takes place in spatio-temporal context, which


includes specific locations and times with their specific properties and various
existing objects, also with their properties. In the course of movement, various
relations occur between the movers and their spatio-temporal context (Sect. 2.8.1).
9.4  Movement in Context 355

Events, locations, and time units also exist in the context and have relations to the
context. In our task typology (Sect. 2.12), for each possible task focus (movers,
events, space, and time), there are two groups of task targets: characteristics and
relations. The latter mean relations to the context, which includes other entities
of the same kind (i.e. other movers, other events, other locations, or other time
units) and entities of other kinds. Each of the chapters describing analytical meth-
ods includes methods or procedures for analysing relations to the context.
There are two possible approaches to detecting and analysing relations: observa-
tion and automated extraction. In the research field of data mining and knowledge
discovery, only the latter approach is used. Occurrences of specific types of rela-
tions (in data mining, the term “patterns” is used) are detected and extracted by ded-
icated algorithms (e.g. Laube et al. 2005; Kalnis et al. 2005) or by means of special
query languages (e.g. Sakr and Güting 2011). Visual representation of the extracted
relations is out of the research scope, although the researchers have to use some
primitive visualization tools to be able to see the results of computations or queries.

9.4.1 Visual Tools for Observation of Relations

Visual analytics, where visual representations play a principal role, allows detect-
ing and analysing relations by observation. Relevant information is visually repre-
sented in such a way that analysts can see the existing relations. A very important
visualization method for different forms of movement data is cartographic map
where the data are represented in the spatial context. The analyst can relate the
spatial positions of movers or events and/or their positional attributes to the spatial
positions of other objects and to properties of surrounding objects and locations.
Also presence counts and flow characteristics can be related to the spatial context.
Static maps, however, do not show emergence and evolution of spatial relations
over time. Unfortunately, space–time cube does not provide adequate support for
observation of either spatial or temporal relations due to projection-induced dis-
tortions of positions, distances, and directions in space and time. An animated
map by itself is also not a good tool for observing the dynamics of spatial rela-
tions since changes may be difficult to notice and keep track of. However, it can
be enhanced with special interactive techniques, for example, the “staining” tech-
nique (Bouvier and Oates 2008) mentioned in Sect. 5.2.3.
Temporal displays provide potential opportunities for observing temporal rela-
tions. However, temporal displays typically show entities of one kind, for exam-
ple, only trajectories (as in a temporal bar chart; see Sect. 4.1). It is possible to
see temporal relations among these entities or their components and relations to a
selected time moment (i.e. a particular position on the time axis) but not relations
to other kinds of entities. For the latter purposes, dynamic links between two or
more displays can be used. An example of a dynamic link between a temporal bar
chart and a map is shown in Fig. 4.6 (Sect. 4.1). Lundblad et al. (2009) use a time
graph to visualize weather parameters along the routes of selected ships. Besides,
356 9  Discussion and Outlook

multiple weather parameters for all ships at a selected time moment are shown on
a parallel coordinate plot. The links between the displays are established through
interactive selection of ships or time moments for the additional displays.
Observation of temporal relations of various entities to temporal cycles can be
supported in multiple ways. One way is arrangement of display elements repre-
senting individual or aggregated entities according to temporal cycles. A matrix
arrangement, where columns correspond to positions of some temporal cycle and
rows to consecutive cycles, is used in the periodicity chart in Sect. 7.2.5, time
mosaic displays in Sect. 8.1.1, and mosaic diagrams on maps, which can be seen
in Figs. 1.24 and 9.2. The arrangements are used for aggregated events (event
counts by time units), spatial situations characterizing time units, and local time
series characterizing places, respectively. A radial arrangement, where circle sec-
tors correspond to positions of a temporal cycle, is used in flower diagrams (Sect.
6.2.2) representing spatial events.
Another way, which is applicable to spatial events, including trajectories and
their components, is aggregation by positions in temporal cycles. The aggregates
can be represented, for instance, by one- or two-dimensional histograms. Thus,
two-dimensional histograms in Sect. 1.2 (Figs. 1.12 and 1.14) show the distribu-
tion of events over days of a week and hours of a day, in Sect. 5.2.3 (Fig. 5.43)
by months of a year and hours of a day, and in Sect. 6.2.3 by years and months of
a year. A trajectory wall display (Sect. 5.1.3) includes a special display element,
called time lens, which shows values of a positional attribute from a selected area
aggregated by units of one selected temporal cycle: months of a year, days of a
week, or hours of a day. Events aggregated by units of a time cycle and locations
in space can be represented by rose diagrams on a map (Fig. 5.44). Zhao et al.
(2008) use circular temporal histograms to explore the dependency of movement
behaviours on temporal cycles. They also suggest a visualization technique called
ring map, a variant of a circular histogram where aggregate values are shown by
colouring and shading of ring segments rather than by ba lengths. Multiple con-
centric rings can represent aggregation according to an additional attribute, for
example, activity performed by the moving objects.
The third way is transformation of time references of trajectories and events
according to temporal cycles (Sect. 3.3). The transformed time references are then
used instead of the original references when the trajectories or events are visual-
ized on temporal or spatio-temporal displays. Section 1.2 gives examples of trans-
forming time references in trajectories and showing them in a space–time cube.
In Sect. 7.3.3, droplet maps and DCDV displays are applied to trajectories trans-
formed to the daily cycle.
The droplet map display demonstrates yet another possible approach to rep-
resenting relative temporal positions with respect to a temporal cycle: by col-
our-coding. In this particular case, the colouring of the lines representing flows
between places differentiates the flows that occurred on work days from those that
occurred in the weekend. An example of using the same idea in a growth ring map
display can be found in the paper by Andrienko et al. (2011a), where pixels in
growth rings are coloured in four different hues according to the seasons of a year.
The example is discussed in Sect. 6.2.1.
9.4  Movement in Context 357

Relations of connectedness and temporal order (flow) between places that


emerge due to objects moving between them (Sect. 2.8.2) are visualized on dis-
plays where elements representing the places are connected by special elements
representing the relations. The most widely used technique is flow map, where
flow symbols represent flow relations and their strengths, or magnitudes. Another
representation is node-link diagram, where nodes represent places and links show
relations between them. This technique is used in the droplet map (Sect. 7.3.3).
The DCDV display technique presented in the same section can also be seen as a
kind of node-link diagram where bar segments are nodes. Node-link diagrams are
applicable when the number of distinct POI is small. Node-link diagrams can also
be used for observing relations between time units in terms of changes of movers’
positions, as demonstrated in Sect. 8.2.
Changes of movers’ positions from one time unit to another can also be shown
on a map by straight lines, arrows, or fragments of trajectory lines connecting
the earlier and later positions of the movers (Sect. 8.2). This technique is espe-
cially suitable for revealing spatio-temporal trends, as defined in Sect. 2.8.2, that
is, coherent changes of positions of many movers, such as movement in the same
direction, convergence, divergence, etc.

9.4.2 Computational Enhancement to Observation


of Relations

In this book, we have presented multiple ways in which visual observation of rela-
tions between movement and the context can be supported computationally. Two
approaches have been already mentioned in the previous section: aggregation
by positions in temporal cycles and time transformations. The other approaches
include space transformations (Sect. 3.3), derivation of new thematic attributes
expressing relations of movers to selected context elements (Sects. 3.4 and 5.2.2),
computation of event-related statistical summaries from trajectories (Sect. 5.2.3),
density-based spatio-temporal clustering of spatial events (Sect. 6.1), and compu-
tational summarization of context data attached as thematic attributes to records of
movement data (Sect. 6.2.3).
Section 3.3 gives several examples of space transformations that can be use-
ful for observing relations of movement to the context. The basic idea is to trans-
form the absolute spatial positions of movers to relative positions with respect
to selected elements of the spatial context. This idea can be realized in differ-
ent ways, depending on the nature of the selected context elements. Thus, for a
selected path (such as the railway between Paris and Lyon), absolute coordinates
of movers can be transformed to relative positions on the path. For a specific
location, positions of movers can be transformed to distances from this location.
Furthermore, locations from trajectories of different movers that play the same
role for these movers (for example, home locations) can be treated as a single
location and the positions of the movers transformed to relative positions with
respect to this location.
358 9  Discussion and Outlook

It is also possible to compute relative spatial positions of movers with regard to


moving elements of the spatial context. This just means that each spatio-temporal
position (s,t) is transformed based on the position and, possibly, the movement
direction of the reference context element at the same time t. Section 3.3 gives an
example of transforming coordinates of objects moving together in a group into
relative positions within the “group space”. The position of the group centre and
the direction of the group movement vector in each time moment provide the ref-
erence for the coordinate transformation. The results of the transformation allow
observing the movement of each group member in the context of the group, as
described in Sect. 5.2.2.
In a similar way, it would be possible to transform the absolute spatial posi-
tions of various spatio-temporal context elements to relative positions with respect
to the positions of a selected mover. This could support observing the relations
between the selected mover and the context, including other movers and spatial
events. For example, positions of social animals, such as apes or wolves, can be
transformed relatively to the positions of the alpha (i.e. highest in rank) individual
of their community. This could be used for observing the relations between the
alpha and the other community members.
As described in Sect. 3.4, elementary binary relations of spatial and temporal
distance, spatial direction, and temporal order between positions of movers and
elements of the spatio-temporal context can be represented by thematic attributes
derived from movement data and context data. Neighbourhood relations, which
are based on distance relations, can be represented by counts of context elements
within given spatial and/or temporal distances. In addition, Sect. 5.2.2 describes
derivation of thematic attributes expressing specific relations in a group of coher-
ently moving objects. The computed values of attributes expressing relations to
the context (these attributes may be called “relational attributes”) are attached to
the records of the movement data. After that, the values can be visualized in all
ways suitable for positional attributes, for example, on a map, in a space–time
cube, in a temporal bar chart (Sect. 4.1), or on a trajectory wall (Sect. 5.1.3).
Relational attributes can be analysed together with other positional attributes with
the use of multi-attribute clustering followed by visualization of cluster member-
ship (Sect. 5.1.4).
Using interactive tools for filtering (Sects. 4.2.1 and 4.2.3), it is possible to
select spatial events or trajectory segments where particular values of relational
thematic attributes are attained. A filter can include several constraints on values
of different relational attributes. This allows looking for quite complex spatio-
temporal relations by setting constraints in terms of elementary spatial and tem-
poral relations. Then, the spatio-temporal distribution of filter-selected events or
segments can be observed with the use of spatial, temporal, and spatio-temporal
displays. Filter-selected trajectory segments can be extracted from the trajectories
and then treated as spatial events (Sect. 4.2.3). This gives additional opportunities
for visualization and analysis, as described in Sect. 5.2.3 and Chap. 6.
Computation of event-related statistical summaries from trajectories supports
investigation of relations between movement characteristics and spatial events, in
9.4  Movement in Context 359

particular, impacts of spatial events on movement. The idea is to summarize values


of various positional attributes from trajectory points located in spatio-temporal
neighbourhoods of spatial events. The resulting summaries (minimum, maximum,
average, median, quartiles, etc.) are attached as new thematic attributes to the spa-
tial events. For example, from trajectories of cars and spatio-temporal positions
of sharp deceleration events, one can compute the average, minimal, and maxi-
mal distances from a car to the nearest car in the spatio-temporal vicinity of each
deceleration event. These statistics can then be compared with the general statis-
tics obtained from all position records.
Event-related statistics can be computed separately for temporal intervals
positioned differently in time with respect to the lifetimes of the events: before
the events, during the events, and after the events. By comparing the statistics
from the different intervals, it is possible to detect and investigate changes in the
movement related to the events. Section 5.2.3 contains an example of investigat-
ing changes in the movement behaviours of roe deer caused by events of prox-
imity to lynxes. Statistics of distances of the roe deer to open spaces has been
computed for time intervals before and after the events. Comparison of these
statistics revealed a tendency of roe deer to move to open spaces when being
approached by a lynx. This is a clever strategy since lynxes tend to avoid open
spaces.
Density-based spatio-temporal clustering of spatial events (possibly, extracted
from trajectories) is a tool for detecting occurrences of the relation of spatio-tem-
poral concentration, that is, closeness in space and time, between spatial events
and for finding composite spatial events, such as traffic congestions, consist-
ing of multiple elementary events, such as low speed events of individual cars.
Visualization of clustering results on a map, in a space–time cube, in temporal his-
tograms and other temporal displays allows the analyst to observe not only when
and where the concentration relations occurred but also when and where the events
were sparsely scattered. Hence, clustering supports the observation of spatio-tem-
poral relations among the events and between the events and the enclosing space
and time.
As discussed above, various relations to the context can be represented by
values of derived thematic attributes (relational attributes) attached to position
records in trajectories or to spatial events. Besides, thematic attributes characteriz-
ing the context may originally exist in movement data or can be attached to trajec-
tory positions based on thematic values of context elements in their vicinity. Thus,
in Sect. 3.4, we mentioned a service that enriches position records of trajectories
with attributes about the weather conditions in the respective places and times
(http://www.movebank.org/). Any attributes describing the context can be statisti-
cally summarized for selected parts of trajectories or extracted spatial events. This
may also involve spatio-temporal aggregation. Numeric attributes can be summa-
rized by means of usual statistical operators: minimum, maximum, mean, standard
deviation, median, quartiles, percentiles, etc. Attributes with textual values, such
as titles of georeferenced photos or texts of georeferenced microblog posts, can
be summarized by finding frequent words and combinations, as described in Sect.
360 9  Discussion and Outlook

6.2.3. Results of numeric summarization can be visualized in standard statistical


graphics or in diagram maps. Results of text summarization can be examined with
the help of a text cloud display (Sect. 6.2.3).

9.4.3 Extraction of Relation Occurrences

We have mentioned earlier that occurrences of many types of relations that are
based on elementary spatial and temporal relations of distance, direction, and
ordering can be detected by expressing the elementary relations through values of
derived thematic attributes followed by interactive filtering based on values of the
derived attributes. However, this approach is limited to binary spatial and tempo-
ral relations and to co-occurrences of two or more binary relations for the same
trajectory point. For example, an interactive filter can select trajectory points of
roe deer that occurred in at most 2 h after events of encountering lynxes and are
located in open spaces. This filter combines constraints on the temporal distance
relation to encounter events and spatial distance relation to open spaces. The pos-
sibility to combine several constraints gives much flexibility in specifying rela-
tions to look for. However, the filter selects only those points where all constraints
are satisfied simultaneously. It is impossible to select a sequence of two or more
trajectory points or segments where particular relations or movement characteris-
tics occur one after another in a particular order. Thus, it is impossible to select a
sequence of positions of roe deer where the first position is close to a lynx and not
in open space and the following one or more positions within two hours from the
first one are in open space. Another limitation is that complex relations involving
more than two entities cannot be represented merely by attribute values attached to
individual trajectory points.
As noted in the beginning of Sect. 9.4, methods for automated extraction of
relation (“pattern”) occurrences are developed in the field of data mining and
knowledge discovery. These methods include query languages, where a variety
of constraints and relations between them can be expressed and dedicated algo-
rithms searching for occurrences of particular relations, such as flock or leader-
ship. Visualization of results of these methods is beyond the research scope of data
mining and knowledge discovery. It is a good task for visual analytics to combine
computational techniques for pattern extraction with interactive visualizations. A
potential problem is that each type of relation, or pattern, may require a specific
way of visualization. This book contains examples of computational extraction of
occurrences of two types of relations: encounters between moving objects (Sects.
5.2.1 and 5.2.3) and frequent sequences of visited places (Sect. 7.3.4). For these
relations, we suggest suitable visualizations (see Figs. 5.29 and 6.6, Sects. 7.32–
7.35), which, possibly, cannot be used for other types of relations.
Another problem is the tendency of computational methods to extract patterns,
or relation occurrences, in very large amounts. Visualization methods that can
show details of individual relation occurrences are not suitable for representing
9.4  Movement in Context 361

numerous occurrences. There is a need in representing relation occurrences in a


compact and, possibly, even aggregated form. To avoid complete loss of informa-
tion about the internal structure of the relation occurrence, it is necessary to reflect
it somehow in thematic attributes. For example, it is possible to reflect structural
characteristics of encounter relations (Fig. 5.27) in a thematic qualitative attrib-
ute “type of encounter” with values “parallel”, “cross”, “head-front”, “parking”,
etc. The values of such thematic attributes can then be visualized and/or used for
aggregation.

9.4.4 Support of Analytical Reasoning

Despite all possibilities of computational extraction of relation occurrences, the


main role in studying movement in context belongs to a human analyst, who
observes relations and on this basis draws conclusions, generates and checks
hypotheses, and makes logical inferences. All previously described techniques and
tools are intended to support human analytical reasoning about movement in con-
text. An example of analytical reasoning with the use of visual, interactive, and
computational aids for detecting and observing various relations is given in Sect.
6.3.2. It includes extraction of relevant positions of movers based on context infor-
mation provided in the form of texts, density-based spatio-temporal clustering of
the extracted positions for determining the relations among them and between
them and the spatio-temporal context, summarization of context information for
the clusters, extraction of parts of trajectories having particular temporal relation
to relevant spatial events, selection of trajectories ending in particular places, and
computation of various statistics. By computationally enhanced observation of the
relations on visual displays and analytical reasoning, we managed to detect two
different diseases, find the areas affected by each of them, determine the way in
which each disease spread, and discover the event that caused both diseases.
Analytical reasoning is a creative process for which no algorithm can exist.
Therefore, the task of visual analytics as a technology is to create environments
that can support human analytical reasoning by properly integrated tools and ena-
ble a variety of possible analytical workflows.

9.5 Movement Behaviours

The general goal of movement analysis is to study and understand movement


behaviour(s). As explained in Sect. 2.11, we use the term “behaviour” in a more
synoptic sense than can be found in some research literature on movement analy-
sis (e.g. Parent et al. 2013), where this term refers to low-level semantic interpreta-
tions of individual trajectories (e.g. a trajectory shows Tourist behaviour) or local
interactions between movers (e.g. two or more movers show Meet behaviour). For
362 9  Discussion and Outlook

us, the term “movement behaviour” refers chiefly to a general way in which an
object, a set of objects, or a class of objects moves in space over time and inter-
acts with the spatio-temporal context. However, we do not exclude more specific
usages of the term. We have suggested several dimensions for classification of
movement behaviours, among them, specificity or generality in terms of mov-
ers, space, and time. According to these dimensions, Parent et al. (2013) focus on
mover-, space-, and time-specific behaviours, which are also called “patterns” by
other researchers (Dodge et al. 2008; Laube 2009).
Irrespective of the level of generality, movement behaviours essentially include
relations between the movers and the spatio-temporal context. Therefore, eve-
rything that has been discussed in the previous section is relevant to the topic of
movement behaviours. In particular, all techniques that support observation of
movement-context relations also support observation of movement behaviours
consisting of these relations. Depending on the level of generality of the behaviour
under study, relations to the context need to be analysed on either elementary or
synoptic level.
In accord with our conceptual framework, mover-, space-, and time-specific
movement behaviours can be described in terms of spatial events, that is, a behav-
iour is a composition of one or more spatial events linked by spatial and temporal
relations. The spatial events making a specific movement behaviour can be move-
ment events, that is, occurrences of particular movement characteristics (e.g. stop
events, low or high speed events, turn events, etc.), or relation events, that is, occur-
rences of particular relations between the mover(s) and the context (e.g. encoun-
ter events, meeting events, place visit events, etc.). Most often, specific movement
behaviours are described as spatial events linked by temporal sequence relations, as
in the example of tourist behaviour cited in Sect. 2.11, but, in general, all temporal
relations described by Allen (1983) can be used (Sakr and Güting 2011).
Hence, methods for extraction of individual spatial events and compositions
of spatial events from movement data can support analyses of specific movement
behaviours. Parent et al. (2013) make a survey of the methods developed in the
field of data mining and knowledge discovery. Some of the methods extract occur-
rences of particular relations; these methods have been mentioned in the previous
section. There are also database techniques and corresponding query languages
enabling search for composite behaviours consisting of two or more spatial events.
To be practically usable, the methods for automated extraction of specific move-
ment behaviours need to be combined with visual and interactive techniques ena-
bling human judgement and control over method settings. For example, Sakr et al.
(2011) combine interactive visual techniques with queries to a moving object data-
base (MOD). The authors demonstrate finding of composite behaviours in aircraft
landings such as missed approach (interrupted landing attempt), when an aircraft
approaches the airport and descends in order to land but then goes up again. Visual
analytics tools are used for an initial exploration of the data, selection of a suitable
subset for further analysis by interactive filtering, finding of suitable parameter
settings for the MOD queries, and then for viewing and interactive analysis of the
query results.
9.5  Movement Behaviours 363

Unlike specific movement behaviours, general movement behaviours cannot be


viewed as mere compositions of specific spatial events that occurred in specific
locations at specific times. A general movement behaviour is a common basis for
multiple specific movement behaviours occurring in multiple places and/or mul-
tiple times. In other words, specific movement behaviours are instantiations of a
general movement behaviour. As an abstraction from specific behaviours, a gen-
eral behaviour can be described in terms of types of spatial events, which, in turn,
are described in terms of relations to types of context elements rather than spe-
cific context elements. Thus, the example definition of tourist behaviour given by
Parent et al. (2013) describes, in fact, a general movement behaviour: a daily tra-
jectory starting in an accommodation place, visiting a museum or a tourist attrac-
tion, having a stop in an eating place, and ending in the same place where it has
started. This description refers to types of places: accommodation place, museum,
tourist attraction, and eating place. It also refers to types of events: start of move-
ment, visit of a place, stop, and end of movement. It also implicitly refers to a type
of time interval with respect to the daily temporal cycle, namely day. A descrip-
tion of a general movement behaviour may also include quantifiers referring to
movers or objects from the spatio-temporal context (e.g. all, most of, many, some,
none), space (everywhere, in many places, in some places, nowhere), and/or time
(always, most of the time, frequently, sometimes, never).
The description of tourist behaviour discussed above does not result from anal-
ysis of movement data. It has been defined based on preexisting knowledge. After
representing the definition by a formal expression, an algorithmic method can use
this expression to find in movement data specific movement behaviours instantiat-
ing this general movement behaviour. However, there are no algorithmic methods
that could extract descriptions of general movement behaviours from movement
data.
Reconstructing a general movement behaviour from movement data, which
only describe specific movements and events, requires analytical abstraction and
generalization. Currently, this can only be done by human analysts, which have
the capacity to see the forest for the trees. It is the mission of visual analytics to
support this capacity as much as possible by visual aids. A perceptual psycholo-
gist Rudolf Arnheim argues in his brilliant book “Visual Thinking” that abstraction
is inherently involved in human perception: “There is no way of getting around
the fact that an abstractive grasp of structural features is the very basis of percep-
tion and the beginning of all cognition” (Arnheim 1997, p. 161). Hence, appropri-
ate visual representations of movement data are necessary to enable an abstractive
grasp of structural features characterizing a general movement behaviour.
For creating appropriate visual representations supporting abstraction and gen-
eralization, various data transformations can be useful:

• transformations of time and/or space, which align multiple specific movement


behaviours (Sect. 3.3);
• spatial and temporal generalization (Sect. 3.6);
• spatio-temporal aggregation (Sect. 3.8);
364 9  Discussion and Outlook

• spatial summarization of trajectories (Sect. 5.1.1);


• Statistical summarization of events and fragments of trajectories (Sect. 5.2.3).
Besides, a helpful tool is clustering, which groups entities by similarity or prox-
imity and thereby allows considering groups with their general features instead of
individual entities with their specific features. Clustering can be applied to trajec-
tories (Sect. 5.1.2), spatial events (Sect. 6.1), local time series (Sect. 7.2.3), and
spatial situations (Sect. 8.1.1).
Section 9.4, which discusses the tools supporting visual observation of relations
of movement to the context, deals with observation of not only specific relation
occurrences but also more general relations relevant to general movement behav-
iours. These more general relations include relations to temporal cycles, relations
to types of places (such as home place), and impacts of a type of spatial events
on movement behaviours. Reconstruction of general movement behaviours can
also be done through extracting specific movement behaviours (e.g. by automated
methods) followed by spatial, temporal, and statistical summarization and/or clas-
sification according to spatial, temporal, and thematic characteristics.
Comprehensive reconstruction of general movement behaviours requires look-
ing at the data from different perspectives: mover-oriented, event-oriented, space-
oriented, and time-oriented. A study may follow one of the workflows represented
in the flow chart in Fig. 9.8. In this book, readers can find two examples of analys-
ing the same data from several (but not all) perspectives and thereby learning dif-
ferent aspects of the respective behaviour.
One example is the personal driving dataset (Sects. 1.2, 2.10.1, and 7.3.3), from
which we have learned a lot about the general movement behaviour of the person.
We have learned the distribution of the person’s movements in space, the roads used
and the frequencies of their use. We have found the significant places of this person,
understood their meanings, and learned the usual days of the week and times of the
day when the person comes to these places and leaves them, as well as the usual
duration of staying in each place. We have learned what routes the person uses to get
from place to place and how much time they usually take. By means of a trajectory
wall display, we could also learn what the usual movement speeds on these routes
are, where delays due to traffic conditions can occur, and how often such delays
occur. We have understood the temporal sequence relations between the visits of the
places and the usual times of the trips between the places. For different days of the
week and times of the day, we have learned the possible and the most typical where-
abouts of the person and, for some places, the respective activities (working, shop-
ping, or sports). We have reconstructed the routine work day schedule of the person.
The other example is the Milan cars dataset (Sects. 1.3, 2.10.2, 4.2.4, 5.1.2,
5.1.3, 6.1.4, 7.3.1, 7.3.2, and 8.1.1), which represents the collective movement
behaviour of independently moving cars. By analysing this dataset from multiple
perspectives, we have learned the following aspects of the behaviour:
• the most frequently taken routes, their absolute frequencies and relative fre-
quencies with respect to the total number of trips;
• the frequencies of using different streets;
• the dynamics of the movement speeds along the most heavily used roads;
9.5  Movement Behaviours 365

• the places and times of reduced speeds and traffic congestions and the extents of
the congestions in space and time;
• the characteristics of the flows in different parts of the street network and their
daily and weekly variation;
• the dependencies between the traffic intensity and the possible speed of move-
ment in different parts of the street network;
• The typical traffic situations in different days of the week and times of the day.

In both cases, the behaviours have been learned through visually and computation-
ally supported observation and analytical reasoning based on multi-perspective view
of the datasets. Examples of systematic consideration of a single dataset from multi-
ple perspectives can be found in the papers by Andrienko and Andrienko (2013a, b)
and Andrienko et al. (2012a). In the first two papers, various visual analytics methods
are applied to the North Sea vessels data and Milan cars data, respectively. For the
Milan cars data, the second paper contains examples of analysis that have not been
included in this book. The third paper presents a visual analytics methodology for
analysis of eye movement data. Most of the examples use the same eye movement
dataset. Although the different perspectives are not explicitly mentioned, they can be
easily recognized.
This book contains also an example of quite deep investigation of the behav-
iour of a group of people moving together (Sects. 2.10.5, 3.3, 5.1.4, and 5.2.2).
On the one hand, this is a specific behaviour in terms of the movers, space, and
time: we can only learn how those particular people behaved during that particular
walk and cannot generalize this to other people or to walks in other places and/or
at other times. On the other hand, by analysing this behaviour, we are not inter-
ested just to learn where each individual was in each time moment. We are inter-
ested in (1) the behaviour of the group as a whole, in particular, how compactly
and coherently the members move, (2) the relations between the members within
the group, in particular, whether there is a stable arrangement and/or permanent
leader, and (3) the behaviour of each individual with respect to the group, in par-
ticular, whether the individual tends to go in front or in the rear, keep closely to
others or in some distance, conform to the group movement direction or deviate,
etc. These aspects of the collective behaviour involve abstractions and generaliza-
tions over the set of movers, space, and time, although the scales are different than
in the previous examples: there are much fewer movers than in the case of Milan
cars and the territory is much smaller and the time interval shorter than in both
previous examples.
Hence, it is possible to consider the behaviour of the group of walking people
as intermediate between a highly specific set of events (e.g. walk southwards for
5 min, then stop for 3 min, then walk to the southeast for 4 min) and a highly
general behaviour, as the routine daily behaviour of an individual or the weekly
and daily behaviour of the traffic in Milan. Therefore, in classifying a move-
ment behaviour as specific or general with respect to movers, space, and time, we
should take into account that specific-generic is not a dichotomy but rather a con-
tinuum. The position of a behaviour on this axis depends on how much abstraction
and generalization it involves.
366 9  Discussion and Outlook

Wood and Galton (2010) argue that collective motion can be analysed and
described at different levels of spatial granularity. At the coarsest level, the move-
ment (i.e. change of position) of the collective as a unit is considered, at the inter-
mediate level, the evolution of the spatial footprint of the collective, and at the finest
level, the movements of the individual members. The coarsest and intermediate lev-
els definitely involve spatial abstraction; however, the movement of the individuals
is described in terms of specific episodes, or events, in our terms. In our examples,
we try to characterize individual movement behaviours on a more general level.
Thus, in Sect. 5.2.2, we analyse the typical or preferred positions of each group
member within the group space. In describing the behaviours, we use such words as
“tends to” and “mostly”, which signify temporal and spatial generalization.
We believe that currently only visual analytics can support reconstruction of
general movement behaviours from movement data while automatic methods can
only extract mover-specific, local, short-term behaviours or recognize instances
of predefined general behaviours represented by formal expressions (Parent et al.
2013). The reconstruction process involves abstraction and generalization, semantic
interpretation based on analyst’s background knowledge, analytical reasoning, and
synthesis of new knowledge from pieces of obtained information. All these opera-
tions take place in analyst’s mind, but they are enabled and stimulated by appro-
priate visual representations and analyst’s interactions with these representations.
Visual analytics makes use of computational processing of movement data, includ-
ing extraction of specific behaviours, but puts emphasis on presenting the outcomes
of computational techniques to the analyst’s eyes in a way enabling abstraction,
generalization, interpretation, analytical reasoning, and knowledge synthesis.
Where visual analytics is currently weak is in capturing and representing the
process and results of human-conducted analysis and thinking. Quite typically, vis-
ual analytics tools let analysts gain knowledge but do not support expressing this
knowledge in tangible artefacts that could be stored, communicated, revised, and
used in further analyses. This refers, in particular, to knowledge about movement
behaviours. This book contains only three examples of creating analytical artefacts
that explicitly represent some aspects of general movement behaviours. In Sect.
5.1.2.5, we describe the procedure of building a classifier that represents the most
frequent routes taken by cars in Milan. The classifier is an abstraction from the
specific trajectories of the cars as it represents each route by one or a few pro-
totypical trajectories. The classifier can be used for recognizing instances of the
frequent routes among new trajectories. The second example is creation of a model
representing a set of spatial time series (Sects. 7.2.4 and 7.3.1). The model pro-
vides an explicit parsimonious description of overall spatio-temporal variation. It
assigns places or links to groups (clusters) with similar temporal variations of local
characteristics and describes the general character of the temporal variation in each
cluster by a time series model. In the third example (Sect. 7.3.2), an analogous
approach is used to represent dependencies between two attributes of traffic flows.
In all three cases, the models are not built automatically but by a human analyst
who expresses in this way his/her understanding of the observed behaviour. This
is done by interacting with visual displays and computational tools for modelling.
9.5  Movement Behaviours 367

While these examples show possible ways to externalize analyst’s knowledge


gained in the process of analysis, they do not cover all types and all aspects of
movement behaviours. This is one of the open problems requiring not just tech-
nical development of new tools but also designing appropriate formal represen-
tations for different types and aspects of behaviours and finding approaches to
building these representations.

9.6 Personal Privacy

Our book contains examples showing that analysis of personal movement data
allows finding the places where a person lives and works and identifying other
places regularly visited by the person. On this basis, it is possible to disclose
the person’s identity. The individual movement behaviour of this person can be
reconstructed, and this information can be used in ways harmful for the person.
For example, knowing the usual times when the person is not at home may allow
planning a burglary, or person’s employer may not like some aspects of person’s
lifestyle, which may have consequences for person’s career. The possible dangers
from disclosing sensitive personal information are relevant not only to detailed
GPS traces but also to temporally sparse episodic movement data, such as records
of mobile phone use or Twitter posts. Thus, in Sect. 7.2.6, we showed that it is
possible to find person’s home and work places from Twitter data.
Hence, there is a need in developing such visual analytics methods that can
support various analysis tasks while protecting personal privacy. The prin-
cipal approach is data transformation, which can be done by a trusted party.
Transformed data, from which personal information can no more be extracted, can
then be provided for analysis purposes. Depending on the analysis task to be per-
formed, movement data need to be transformed in different ways.
The most evident way to prevent disclosure of personal information is spatial
and spatio-temporal aggregation of movement data from many movers (Sect. 3.8).
This way is suitable for space- and time-oriented tasks. Certain precautions need
to be taken, however, before aggregated data are given to someone for analysis.
Suppose that the data are aggregated by compartments of a space tessellation. For
each compartment, the total number of different movers and the number of differ-
ent movers visiting the compartment by hours of the day during a week are com-
puted. If only one mover appears in one of the compartments, it may be possible
to discover the mover’s identity and learn the usual times when the mover is in the
compartment and when the mover is not there. This may endanger the personal
privacy of the mover. To prevent such cases, the space tessellation must be coarse
enough so that data from multiple movers are aggregated for each compartment.
Here, the privacy protection model known as k-anonymity (Sweeney 2002) can
be applied: a release of information provides k-anonymity protection if the infor-
mation for each person contained in the release cannot be distinguished from at
least k−1 individuals whose information also appears in the release. The parameter
368 9  Discussion and Outlook

k determines the level of privacy protection. An appropriate value can be prese-


lected by the trusted data transformer.
Hence, in the case of space tessellation for data aggregation, the space needs to be
divided into compartments in such a way that each compartment contains visits of at
least k different movers. The approach that divides the space into compartments of
variable sizes depending on the data density (Sect. 7.1.1) can be particularly helpful
here. In the parts of the space where many movers appear, the compartments may
be smaller, and in the parts that are visited by few movers the compartments may be
larger. Hence, where the movement intensity permits, the analyst may have quite fine
degree of spatial granularity of the aggregated data. Monreale et al. (2013) describe
an approach to privacy-preserving distributed aggregation of movement data, where
territory tessellation into cells of varying sizes is used in the method evaluation
experiments.
For event-oriented tasks, it is appropriate to use spatial events extracted from
trajectories of many movers; the data describing the events must not contain the
identifiers of the movers. Furthermore, the trusted data transformer must ensure
that any existing spatial cluster of these events includes events from multiple mov-
ers, because a cluster made of events (e.g. stops) of a single mover may disclose
a personal place of this mover. According to the k-anonymity model, each spatial
cluster must contain events from at least k different movers. Events from spatial
clusters that do not satisfy this condition must be removed from the dataset before
providing it for event-oriented analysis.
In mover-oriented tasks, the analyst needs to deal with trajectories of movers.
The approach to preventing the disclosure of personal places is in this case transfor-
mation of the spatial references in the trajectories. In Sect. 3.6, we mentioned that
spatial generalization can be used as a tool for protecting personal privacy in trajec-
tory data. The basic idea is presented by Andrienko et al. (2009) while Monreale
et al. (2010) describe a data transformation algorithm that ensures k-anonymity of the
transformed trajectories. As in the earlier discussed case of spatio-temporal aggrega-
tion, the generalization can be based on a space tessellation with varying cell sizes so
that the generalization level is lower in the parts of space attended by many movers
and higher in the parts of space where fewer movers appear. Still, for ensuring k-ano-
nymity, it may be necessary to remove from some of the generalized trajectories those
parts that occur in trajectories of less than k movers. Trajectories transformed in this
way and deprived of mover identifiers can be suitable for analysing common routes
taken by the movers and movement characteristics along the routes.
The generalization-based transformation may be unsuitable for mover-oriented
tasks on reconstruction of individual movement behaviours. For example, the task
may be to find behavioural archetypes and their frequency distribution across the
population represented by the data. A behavioural archetype consists of types
of visited places and typical visit times and durations for these types of places.
Generalized and k-anonymized trajectories are unsuitable because the trajectories
do not have mover identifiers; besides, the data may lack parts of trajectories that
are suitable for identification of personal places. For this kind of task, it is neces-
sary to learn when each person is at home, goes to work, visits shops, goes to a
9.6  Personal Privacy 369

cinema or theatre, does sports, etc.; hence, it is important to have trajectory parts
referring to significant personal places and movements to and from these places.
However, the exact geographical positions of the person’s home, work, regularly
visited shops, and other frequently visited places are not relevant for this kind of
analysis. Therefore, a suitable approach is transformation of the spatial references
from the geographic space to an abstract space. After such a transformation, per-
sonal identity can no more be derived from the spatial positions. Section 3.3 refers to
examples of transformations where absolute geographical positions are transformed
into relative with respect to the positions of each person’s home or home and work
(Kwan 2000). In this transformation, the home places of all people represented by
the dataset are moved to a single point in space, which becomes the origin of the
coordinate system. After that, it is no more possible to find out where in the geo-
graphical space each person lives and appears. An apparent problem here is that the
home place of each person may not be specified in the data. However, the place to
become the origin of the coordinate system for each person does not need to be the
ascertained home place. It can also be the most frequently visited place of each per-
son, which is easy to retrieve from the data by simple computations. Another prob-
lem is that public POI (cinemas, theatres, restaurants, etc.) visited by the people will
become unidentifiable, while information about visiting these types of places may be
important for reconstructing the general behaviours. A solution is to attach a posi-
tional attribute to the transformed trajectories with the values referring to the types of
public objects and places situated in the spatial neighbourhood of the trajectory posi-
tions. This can be done using a database of public POI with their coordinates.
By the current moment, no visual analytics methods have been suggested for
reconstructing routine movement behaviours of people from their trajectories trans-
formed into an abstract space. The transformation and analysis methods address-
ing group movement (Sects. 3.3 and 5.2.2) are not suitable because they focus on
the relative positions of movers with respect to other movers but not on repeatedly
visited personal places and trips between these places. Hence, further research is
needed in this direction. More generally, there are currently no tools in visual ana-
lytics fully implementing the principle “privacy by design” (Monreale 2011) to
preclude the disclosure of private information. It is necessary to develop not only
appropriate methods but also an underlying fundamental framework defining what
kinds of privacy-protecting transformations to apply depending on the properties of
the available data and the types of information that needs to be extracted.
The next section gives a summary of the main open problems and directions for
further research in the visual analytics of movement.

9.7 Future Perspectives

Types of moving entities. So far, the research on analysing movement, not only in
visual analytics but also in data mining, machine learning, and database research,
mostly deals with discrete moving objects treated as moving points. The spatial
370 9  Discussion and Outlook

extents and shapes of real objects are ignored and only the positions taken into
account (it is assumed that for each object and each time moment there is a point
that adequately represents the spatial position of this object at this time moment).
This representation may be sufficient for many applications. However, there are
applications and types of problems requiring other representations of movement
and, as a consequence, other visual and analytical techniques. These applications
and problems can be grouped into following classes according to the types of
moving entities they deal with:
• Discrete moving objects with constant spatial extents and shapes that are impor-
tant and cannot be ignored. For example, in maritime traffic security applica-
tions, accounting for sizes and shapes of vessels is important for understanding
whether specific behaviours of vessels in a given situation are safe or can lead to
a collision.
• Discrete moving objects with changing spatial extents and shapes, such as ice-
bergs, clouds, and spots resulting from oil spills or releases of other substances
in the environment. This category of moving objects also include moving
groups considered as units, for example, shoals of fish, flock of birds, or mass of
flying insects.
• Objects that can be conceptualized as moving lines, for example, atmospheric
fronts or seafront that moves depending on tides.
• Continuous flows, such as ocean currents, wind streams, cyclones, tornadoes, or
even flows of traffic that is not divided into individual vehicles.
Hence, there is a need in research on representation and analysis methods suit-
able for these types of moving entities and the respective applications.
Movement in three-dimensional space. Even point objects that move in three-
dimensional space have been insufficiently addressed in visual analytics and other
sciences. There are some visualization methods suitable for showing individual tra-
jectories (examples are given in Figs. 4.1–4.3) but these displays do not represent
time and are not scalable to showing massive movements. Many of the existing com-
putational methods explicitly or implicitly assume that the space is two-dimensional;
therefore, these methods cannot be used for 3D data. Taking into account what was
said in the previous section regarding movement behaviours, an open research prob-
lem specifically relevant to visual analytics is how to support abstractive perception
and understanding of movement behaviours in 3D space.
Coverage of the task typology by existing methods. The task typology suggested
in Sect. 2.12 appears to be quite well covered at the level of task foci but there is
no full coverage at the level of task targets. We remind that task targets are sub-
divided into characteristics and relations. Characteristics are covered sufficiently
well: for each type of entities (movers, events, locations, and times), there are
methods to visualize and analyse their spatial, temporal, and thematic character-
istics. The coverage of relations is much lower. Each of the chapters of our book
dedicated to movers, events, locations, and times includes some methods, proce-
dures, and examples of analysing relations pertinent to the respective type of enti-
ties. However, it cannot be claimed that all possible relations can be analysed in
9.7  Future Perspectives 371

these ways. For example, it is not quite clear how to analyse relations occurring in
competitive collective movements with multiple participants, such as sport games.
Certainly, it is possible to observe such relations at a low abstraction level by
means of display animation, but this does not sufficiently support reconstruction
of general movement behaviours discussed in the previous section. Here is a clear
space for further research. Another gap in coverage relates to collective move-
ments where some participants exert forces on other participants and thereby make
the latter move or change their movements. An example is a ball game where the
players exert forces on the ball making it move or change its course or other move-
ment characteristics. Currently existing analysis methods may be not well suited to
analysing relations between the players and the ball and the impacts of these rela-
tions on the further movements of the ball and the players.
The problem of supporting relation analysis is quite complex due to potentially
unlimited variety of relations. Although there are not so many basic types of spa-
tial and temporal relations, it is usually not the basic relations that are of interest
and importance but more complex relations composed from these basic relations.
The number of possible combinations is potentially limitless (although, probably,
not all of them occur in reality or have practical value). It would be discouraging if
each composite relation required its unique methods for visualization and analysis.
However, we believe that an appropriate taxonomy of composite relations can be
developed such that visual analytics methods could cover broad classes of relation
types rather than individual types. Although some efforts for building relation (pat-
tern) taxonomies have been undertaken (e.g. Dodge et al. 2008), the results are not
yet practicable. Therefore, not only methods need to be developed but also appro-
priate conceptual models of classes of relations.
Spatial and temporal granularities. It is well known in geographic analysis
that observed spatial patterns depend on how the space is divided into space units.
In particular, for movement data, results of analysis depend on how movement
tracks are divided into trajectories and/or events (episodes) and how the space is
divided into compartments. For example, Fig. 5.3 demonstrates how spatial pat-
terns can change depending on the granularity of the space tessellation. Observed
temporal and spatio-temporal patterns depend also on how the time is divided into
time units. For example, weekly patterns will be invisible when time is divided
into months but seasonal patterns may be lost when time is divided into weeks
or days. Sometimes, it may be quite obvious at what spatial and temporal scales
and with what granularities (i.e. units) the data need to be analysed; however, in a
general case, it may be a hard problem for an analyst to choose appropriate scales
and divisions. Thus, in our explorations with the Milan cars data, we experimented
with multiple tessellations of the territory for finding a variant that conveys in a
good way (i.e. with sufficient abstraction but low distortion) the spatial flow pat-
terns and allows obtaining local time series conveying in a good way (i.e. with low
level of noise) the periodic variation of the traffic. This trial and error approach
is quite typical. It would be good if visual analytics could support the process of
finding the right granularities and also to test the sensitivity of the analysis results
to the chosen granularities.
372 9  Discussion and Outlook

A more general problem is to help the analyst in choosing right settings for
the parameters of transformational and analytical methods and testing the sensi-
tivity of the observed patterns and analysis results to the parameter settings. This
refers, besides the methods for dividing space, time, and trajectories into units, to
methods whose work is based on various thresholds (e.g. for spatial and tempo-
ral distances, speed, number of neighbours, etc.), clustering methods that require
choosing the number of clusters, and many others.
Real-time processing of streaming movement data. Currently existing visual
analytics methods for movement data are designed for historical data that have
been previously collected and do not change during the analysis process. There
are many applications where movement data come or may come continually while
objects move and where there is a need to understand the current situation, how it
develops, and what may happen in the near future. In the areas of data mining and
database research, scientists are already actively developing methods for real-time
processing of streaming movement data. A specific research direction is distrib-
uted processing, which is required when there are not enough time, storage capaci-
ties, communication channels, energy, and/or other resources for sending all data
to a central server and further local analysis on the server.
In information visualization and visual analytics, some research has been
recently conducted on visualizing and analysing streams of events (Fisher et al.
2012) and event-related texts, such as messages in Twitter or news articles (Cao
et al. 2012; Chae et al. 2012; Dou et al. 2012); however, streams of movement
data have not been addressed yet. Here, there is a conceptual problem to under-
stand what kinds of analysis tasks are generally relevant to real-time analysis of
movement data and which of these tasks really require or could really benefit
from visual analytics approaches. Typically, monitoring of real-time processes
requires quick reaction to changes and therefore necessitates high degree of auto-
mation in data processing and analysis. Visual analytics may be needed for tasks
that cannot be automated. As we discussed in the previous section, visual analyt-
ics is needed for reconstruction of general behaviours. In application to stream-
ing movement data, this could be the general behaviour of the observed process.
In particular, it may be necessary to understand whether the process develops in
accord with the current model that is used for the automated monitoring. While
occasional deviations signify abnormal situations requiring the attention of the
operator, there may be significant trends in the deviations of the real behaviour
from the expected one indicating that the reality is changing and the model needs
to be updated. Visual analytics may be needed for understanding that the behav-
iour is changing.
At this time, it appears that the research problems related to distributed process-
ing of movement data are not highly relevant to visual analytics. However, it is
possible to imagine a distributed community of visual analysts each having access
to a part of the data and performing analysis on this part but the analytical artefacts
are exchanged between the analysts so that eventually a complete picture can be
synthesized from distributed fragments. We do not know how realistic this sce-
nario is; possibly, this becomes clearer in the near future.
9.7  Future Perspectives 373

Knowledge externalization. As discussed in Sect. 9.5, visual analytics needs to


develop methods and tools supporting externalization of knowledge gained in the
process of analysis of movement, as well as methods for internal (formal) and vis-
ual representation of different types of knowledge. Internal methods of knowledge
representation should allow the use of the knowledge by algorithmic methods for
data transformation and analysis. Visual methods should allow knowledge com-
munication and recall. Besides, interactive methods should enable knowledge revi-
sion, elaboration, and further extension.
Protecting personal privacy. In Sect. 9.6, we have stressed the need for consist-
ent application of the principle “privacy by design” in developing visual analytics
methods and tools for movement analysis. The approaches to privacy protection
must be appropriate to the types of analysis tasks to be performed. Hence, a gen-
eral framework defining the possible types of privacy-protecting data transforma-
tions and putting them in correspondence with the typology of analysis tasks needs
to be developed, apart from specific methods supporting privacy-preserving analy-
sis of movement.
Hence, there are still quite big unsolved or only partly solved problems requir-
ing further research in visual analytics, but there is also a big potential for solving
these problems.

9.8 Suggested Exercises

At the end of the book, we suggest the readers a few exercises for recalling the
contents of the book and better understanding the main ideas. The formulations
of the exercises refer to the dataset about the North Sea vessel movement (Sect.
2.10.3), but similar exercises can be done using any other movement data, either
really existing or imaginable. Readers are especially encouraged to apply these
exercises to their own movement data or data they have access to.
• Event-based view of movement. Think how the trajectories of the vessels in
the North Sea can be decomposed into spatial events, or episodes, of different
types. Try to define the kinds of spatial events that can be of interest for a mari-
time safety analyst.
• Multi-perspective view of movement. Describe the North Sea vessels dataset
from the four possible perspectives.
• Task typology. For the North Sea vessels dataset, formulate tasks of different
types in specific terms for this dataset.
• Methodology of movement analysis. Develop a plan for comprehensive analy-
sis of the North Sea vessels dataset from multiple perspectives: what tasks need
to be performed, in what sequence (in terms of output–input relationships), and
what methods need to be used.
• Movement behaviours. Think what kinds of individual and collective move-
ment behaviours may exist in the movement of vessels. In what terms general
374 9  Discussion and Outlook

behaviours of different types of vessels can be described? How does it translate


into movement characteristics and relations to the context?
Answers to some of these exercises and questions can be found in the paper
by Andrienko and Andrienko (2013a); however, there is no strict correspondence
between the exercises and the contents of the paper.
Of course, the best reward for the authors’ effort would be the readers practi-
cally using the material of this book in their analyses of movement data or in their
research on visual analytics of movement or other spatio-temporal phenomena.

9.9 Conclusion

The key idea of this book is multi-perspective view of movement. The phenom-
enon of movement can be considered as a set of space–time paths (trajectories)
of moving objects, as a combination of various spatial events distributed in space
and time, as chains of local changes occurring in a multitude of spatial locations,
or as a succession of spatial situations occurring in different time units. Each per-
spective of the movement allows gaining certain types of knowledge through data
analysis. These diverse types of knowledge together make a comprehensive under-
standing of the phenomenon. For each perspective, there is an appropriate repre-
sentation form of movement data and a collection of analysis methods that support
gaining perspective-specific types of knowledge.
The key idea of multi-perspective view is not limited to movement but can be
applied to other spatio-temporal phenomena. Any spatio-temporal phenomenon
can be viewed from the space-oriented and time-oriented perspectives, that is, as
chains of local changes occurring in a multitude of spatial locations or as a suc-
cession of spatial situations occurring in different time units. Furthermore, any
spatio-temporal phenomenon can be viewed as a combination of various spatial
events of two basic types (Wood and Galton 2010): “chunks” of homogeneous
processes, when no changes occur, and transitions, or significant changes of vari-
ous kinds. Besides, any spatio-temporal phenomenon exists and develops within
its spatio-temporal context and inevitably interacts with this context, which results
in occurrences of various relations. Hence, the ideas, approaches, and even spe-
cific methods presented in this book are in a large part applicable to various spa-
tio-temporal phenomena. Some of them may be even applicable to other types of
complex phenomena and things, not necessarily spatial or temporal but composed
of two or more interrelated components differing by nature. Therefore, we believe
that the book can be useful to a wide range of researchers and practitioners, not
only those who deal with movement.
We hope that the book can be useful also in another way. It shows an example
of a systematic approach to a particular type of data reflecting a particular type of
phenomena. It defines the main concepts necessary for thinking about these phe-
nomena and data, looks at the phenomena and data from different perspectives,
9.9 Conclusion 375

develops a system of possible analysis tasks, and determines what methods and
procedures can appropriately support these tasks. We believe that this kind of
approach can be productive in many visual analytics researches.

References

Allen, J. F. (1983). Maintaining knowledge about temporal intervals. Communications of the


ACM, 26(11), 832–843.
Andrienko, N., & Andrienko, G. (2006). Exploratory analysis of spatial and temporal data: A
systematic approach. Berlin: Springer.
Andrienko, N., & Andrienko, G. (2013a). Visual analytics of movement: An overview of meth-
ods, tools and procedures. Information Visualization, 12(1), 3–24.
Andrienko, N., & Andrienko, G. (2013b). Visual analytics of movement: A rich palette of tech-
niques to enable understanding. In C. Renso, S. Spaccapietra, & E. Zimányi (Eds.), Mobility
data: Modeling, management and understanding. Cambridge: Cambridge University Press.
Andrienko, G., Andrienko, N., Giannotti, F., Monreale, A., & Pedreschi, D. (2009). Movement
data anonymity through generalization. In Proceedings of 2nd SIGSPATIAL ACM GIS
2009 International Workshop on Security and Privacy in GIS and LBS (SPRINGL 2009),
November 3, 2009, Seattle, WA, USA. http://doi.acm.org/10.1145/1667502.1667510.
Andrienko, G., Andrienko, N., Bak, P., Keim, D., Kisilevich, S., & Wrobel, S. (2011a). A con-
ceptual framework and taxonomy of techniques for analyzing movement. Journal of Visual
Languages and Computing, 22(3), 213–232.
Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S., & Wrobel, S. (2011b). From movement
tracks through events to places: Extracting and characterizing significant places from mobil-
ity data. In Proceedings of IEEE Visual Analytics Science and Technology (VAST 2011) (pp.
161–170).
Andrienko, G., Andrienko, N., Burch, M., & Weiskopf, D. (2012a). Visual analytics methodol-
ogy for eye movement studies. IEEE Transactions on Visualization and Computer Graphics
(Proceedings of the IEEE VAST 2012), 18(12), 2889–2898.
Andrienko, G., Andrienko, N., Stange, H., Liebig, T., & Hecker, D. (2012b). Visual analytics for
understanding spatial situations from episodic movement data. Künstliche Intelligenz, 26(3),
241–251.
Arnheim, R. (1997). Visual thinking. Berkeley: University of California Press, (1969, renewed
1997).
Bouvier, D. J., & Oates, B. (2008). Evacuation traces mini challenge award: Innovative trace
visualization staining for information discovery. In Proceedings of the IEEE Symposium
on Visual Analytics Science and Technology (VAST 2008) (pp. 219–220). New York: IEEE
Computer Society Press.
Cao, N., Lin, Y.-R., Sun, X., Lazer, D., Liu, S., & Qu, H. (2012). Whisper: Tracing the spatiotem-
poral process of information diffusion in real time. IEEE Transactions on Visualization and
Computer Graphics, 18(12), 2649–2658.
Chae, J., Thom, D., Bosch, H., Jang, Y., Maciejewski, R., Ebert, D. S., et al. (2012).
Spatiotemporal social media analytics for abnormal event detection and examination using
seasonal-trend decomposition. In Proceedings of the IEEE Visual Analytics Science and
Technology (VAST 2012) (pp. 143–152). New York: IEEE Computer Society Press.
Dodge, S., Weibel, R., & Lautenschütz, A.-K. (2008). Towards a taxonomy of movement pat-
terns. Information Visualization, 7(3–4), 240–252.
Dou, W., Wang, X., Skau, D., Ribarsky, W., & Zhou, Z. (2012). LeadLine: Interactive visual
analysis of text data through event identification and exploration. In Proceedings of the
IEEE Visual Analytics Science and Technology (VAST 2012) (pp. 93–102). New York: IEEE
Computer Society Press.
376 9  Discussion and Outlook

Fisher, F., Mansmann, F., & Keim, D. A. (2012). Real-time visual analytics for event data
streams. In Proceedings of the 27th Annual ACM Symposium on Applied Computing (SAC
12) (pp. 801–806). ACM, New York, NY, USA.
Kalnis, P., Mamoulis, N., Bakiras, S. (2005). On discovering moving clusters in spatio-temporal
data. In Proceedings of the 9th International Symposium on Spatial and Temporal Databases
SSTD 2005 (pp. 364–381). Berlin Heidelberg: Springer.
Kwan, M. P. (2000). Interactive geovisualization of activity-travel patterns using three-dimen-
sional geographical information systems: A methodological exploration with a large data set.
Transportation Research Part C, 8, 185–203.
Laube, P. (2009). Progress in movement pattern analysis. In B. Gottfried & H. Aghajan (Eds.),
Behaviour monitoring and interpretation: Ambient assisted living (pp. 43–71). Amsterdam:
IOS Press.
Laube, P., Imfeld, S., & Weibel, R. (2005). Discovering relative motion patterns in groups of
moving point objects. International Journal of Geographical Information Science, 19(6),
639–668.
Lundblad, P., Eurenius, O., & Heldring, T. (2009). Interactive visualization of weather and ship
data. In Proceedings of the 13th International Conference on Information Visualization
IV2009 (pp. 379–386). New York: IEEE Computer Society Press.
Monreale, A. (2011). Privacy by design in data mining. PhD thesis, Pisa: University of Pisa.
Monreale, A., Andrienko, G., Andrienko, N., Giannotti, F., Pedreschi, D., Rinzivillo, S., et al.
(2010). Movement data anonymity through generalization. Transactions on Data Privacy,
3(3), 91–121.
Monreale, A., Wang, W. H., Pratesi, F., Rinzivillo, S., Pedreschi, D., Andrienko, G., Andrienko,
N. (2013). Privacy-preserving distributed movement data aggregation. In D. Vandenbroucke,
B. Bucher, J. Crompvoets (Eds.), Geographic Information Science at the Heart of Europe
(pp. 225–245). Springer International Publishing.
Parent, C., Spaccapietra, S., Renso, C., Andrienko, G., Andrienko, N., Bogorny, V., et al.
(2013). Semantic trajectories modelling and analysis. ACM Computing Surveys, 45(4).
doi:10.1177/1473871613487087
Quddus, M. A., Ochieng, W. Y., & Noland, R. B. (2007). Current map-matching algorithms
for transport applications: State-of-the art and future research directions. Transportation
Research Part C: Emerging Technologies, 15(5), 312–328.
Sakr, M. A., & Güting, R. H. (2011). Spatiotemporal pattern queries. Geoinformatica, 15(3),
497–540.
Sakr, M. A., Behr, T., Güting, R. H., Andrienko, G., Andrienko, N., & Hurter, C. (2011).
Exploring spatiotemporal patterns by integrating visual analytics with a moving objects
database system. In Proceedings of 19th ACM SIGSPATIAL International Conference
on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2011) (pp.
505–508).
Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557–570.
Wood, Z, & Galton, A. (2010). Zooming in on collective motion. In M. Bhatt, H. Guesgen &
S. Hazarika (Eds.), spatio-temporal dynamics, Proceedings of Workshop 21, 19th European
Conference on Artificial Intelligence (pp. 25–30)., August 16–20, 2010, Lisbon, Portugal.
Zhao, J., Forer, P., & Harvey, A. S. (2008). Activities, ringmaps and geovisualization of large
human movement fields. Information Visualization, 7(3), 198–209.
Glossary

Behaviour  The way and manner in which organisms, systems, or artificial entities
act in conjunction with their environment, which includes other system, organ-
isms, and entities as well as the physical environment. This book specifically
addresses movement behaviour.
Clustering  A process that groups a set of entities into homogeneous groups
(clusters) such that objects in the same cluster share a common property, that is,
they are similar with respect to some similarity measure and are dissimilar with
respect to the same measure from objects in the remaining clusters.
Collective movement behaviour  Movement behaviour of multiple objects
moving simultaneously within the same space.
Collective movement event  A movement event occurring to two or more mov-
ers. Collective movement events are defined in terms of relations between the
movers: encounter event, spatial concentration (cluster) event, parallel movement
event, opposite movement event, etc. A collective movement event includes at
least one point from the trajectory of each mover.
Context  The conditions (circumstances) in which an event, action, movement,
etc., take place. In particular, spatio-temporal context of movement consists of
properties of locations and time units and of objects existing in space and time:
static spatial objects, moving objects, and events, in particular, spatial events.
Cumulative movement characteristics (attributes)  Positional attributes com-
puted for the interval from the start of the trajectory to a given time moment or
for the remaining interval to the end of the trajectory. Cumulative characteristics
include all interval characteristics and the temporal distances to the starts and ends
of the trajectories.
Data mining  The process of analysing large amounts of data to identify unex-
pected or unknown patterns that might be of value to an application. Data mining
is usually used as a synonym of knowledge discovery. However, data mining refers
to a particular step in the knowledge discovery process.
Data pre-processing  A step of the knowledge discovery process where data
are prepared before analysis methods can be applied. This step may include data

G. Andrienko et al., Visual Analytics of Movement, 377


DOI: 10.1007/978-3-642-37583-5, © Springer-Verlag Berlin Heidelberg 2013
378 Glossary

cleaning (removing noise and/or outliers, handling missing values, resolving


inconsistencies), integration, formatting, reduction, etc.
Data transformation  A step of the knowledge discovery process where data
are converted to a form matching a particular task or suitable for application of a
particular analytical tool.
Database  A shared collection of logically related data, and a description of
this data, designed to meet the information needs of an organization and to support
its activities.
Density map  A map that shows the presence of a phenomenon over space, for
example, the presence of moving objects, represented as the number per area unit
(i.e. density). Densities are often represented by colour coding, where brighter
colours correspond to higher densities.
Distance function  An algorithm assessing the dissimilarity between objects.
Distance functions are used in clustering. The most common distance function
assesses the dissimilarity as the Euclidean or, more generally, Minkowski distance
between vectors or points in a multi-dimensional space of object features, that is,
values of multiple attributes.
Dynamic attribute  An attribute whose values change over time.
Episodic movement data  Data about spatial positions of moving objects
where the positions between the measurements cannot be reliably reconstructed by
means of interpolation, map matching, or other methods due to large time intervals
between the measurements. Examples are positions of mobile phone calls or credit
card transactions.
Event or temporal object  A physical or abstract entity having a specific posi-
tion in time, that is, instant or interval of its existence. An event can be considered
as instantaneous if its duration is negligibly small or not relevant to an application.
Examples of events are a snowfall, a football game, or a traffic congestion.
Event data  Data describing existential changes, that is, appearance and disap-
pearance of temporal objects (events). For each event, the data specify the time
interval of its existence.
Flow  An aggregate of multiple movements from one location to another.
Examples include count of commuting people or amount of transported goods. A
flow can be seen as a vector connecting two locations and associated with one or
more aggregate attributes derived from the individual movements that have been
summarized.
Flow map  A cartographic representation of flows shown in a geographical
space. Typically, flows are represented by straight or curved lines connecting the
start and end locations with the thickness proportional to the aggregate attributes.
Alternatively, the attributes can be represented by varying levels of transparency or
by colour coding. Flow maps are not meant to show the actual paths between the
locations.
General movement behaviour  Individual or collective movement behaviour
can be general in terms of movers, space, and/or time. In terms of movers, it is the
typical behaviour pertaining to a population or class of movers or to a type of col-
lective. It includes possible variations depending on the properties of the movers.
Glossary 379

In terms of space, it is the common behaviour that can be observed in multiple


parts of space, including possible variations depending on the spatio-temporal con-
text. In terms of time, it is the common behaviour that can be observed during a
sufficiently long time period or in multiple time units, including possible varia-
tions depending on the spatio-temporal context. A general movement behaviour is
a common basis for multiple specific movement behaviours.
GPS track  A movement track recorded by a device that is attached to a mov-
ing object and determines object’s geographical positions using the GPS (Global
Positioning System).
Identifier  One or several attributes that uniquely identify individuals in a data-
base. For example, a Social Security Number uniquely identifies the person with
which it is associated.
Individual movement behaviour  Movement behaviour of a single moving
object.
Individual movement event  A movement event occurring to a single mover (i.e.
within a single trajectory). Individual movement events may be defined in terms of
values of movement attributes (stop event, low speed event, turn event, etc.) or as
instances of relations of movers to the context (visit of a place, coming close to an
object, passing between two objects, etc.). An individual movement event consists
of either one point or several consecutive points of an individual’s trajectory.
Instant movement characteristics (attributes)  Positional attributes refer-
ring to moments (points in time): instant speed, direction, acceleration (change of
speed), and turn (change of direction).
Interaction  A relation between two or more spatial objects when the objects
are so close in space that they may have mutual or reciprocal action or influence
upon one another.
Interpolation of trajectories  Reconstruction of the most probable object’s
positions at the time moments between the moments of the measurements.
Interval movement characteristics (attributes)  Positional attributes refer-
ring to time intervals of a chosen constant length before, after, or around a given
time moment: travelled distance, displacement, sinuosity, tortuosity, various statis-
tics of the instant characteristics, etc.
Knowledge discovery  The process of extracting useful and non-trivial knowl-
edge from data. When applied to movement data, it is often called mobility knowl-
edge discovery. Knowledge discovery consists of many steps, including data
selection, data pre-processing, data transformation, data mining, pattern interpre-
tation, and consolidation of the discovered knowledge.
Location  An element of space. The term may refer to points, areas, lines, or
volumes, depending on the application and data properties.
Map matching  The process of combining electronic map with location infor-
mation to obtain the real position of a moving object in a network.
Medoid  Medoid of a group of points is the point with the minimal average dis-
tance to all other points of the group.
Movement  The sequence of changes in the spatial position of a moving object.
Movement can result from an action performed by a moving object itself to go
380 Glossary

from one place to another (e.g. a person, a car), or by an action performed by other
objects that cause the change of position of the moving object (e.g. a package, a
dead leaf).
Movement behaviour  The behaviour related to movement; the way and man-
ner in which an object, a set of objects, or a class of objects moves in space over
time and interacts with the spatio-temporal context.
Movement data  Spatio-temporal data describing changes of spatial positions
of one or more moving objects. Movement data consist of movement tracks of the
objects.
Movement event  An event occurring during movement of one or more objects,
for example, change of movement characteristics (speed, direction, etc.), stop,
approaching of another object, entering a particular place. Movement events are
typically spatial events. Movement events can be extracted from trajectories.
Movement pattern  A representation that characterizes a specific movement
behaviour shown by one or a set of moving objects. In data mining, it refers to
a pattern extracted from movement data by means of data mining algorithms.
Examples of movement patterns include flock, leadership, converging, encounter.
Movement track  The sequence of position records representing the movement
of an object for the whole duration of the movement.
Mover  See moving object.
Moving object (mover)  An object capable of movement or having movement.
For simplification, we assume the moving object is spatially represented as a sin-
gle moving point. We thus ignore shape changes.
Moving event  A spatial event whose spatial position changes over time.
Object  A physical or abstract entity. Objects can be classified according to
their spatial and temporal properties; see spatial object, event, spatial event, mov-
ing object.
Origin-destination matrix (OD matrix)  A representation of flows in the form
of matrix where the rows and columns correspond to different locations and the
cells contain the values of the aggregate attributes.
Pattern  A representation that characterizes a set (or subset) of data in a sum-
marized way. In data mining, a pattern is a model that represents a summary of the
analysed data set with respect to some criteria.
Place of interest  A specific location that is of interest in a particular context.
Examples include public places such as monuments, hotels, restaurants, or per-
sonal places such as home, work, daily shopping place.
Point of interest  See place of interest. Notice that point of interest is a generic
term which does not necessarily mean that the specific location has point geome-
try; it can be a line or a region. A more precise term is “place of interest” in which
the type of the associated geometry is not specified.
Position record  A data record (e.g. made by a sensing device) representing
a spatio-temporal position of a moving object. The main components of a posi-
tion record are spatial (geographical) coordinates and timestamp specifying the
time when the object’s position was measured. A position record may also include
object identifier and values of thematic attributes related to the position.
Glossary 381

Positional attributes  Attributes characterizing positions or segments within


trajectories. Positional attributes can represent instant, interval, and cumulative
characteristics of the movement.
Progressive clustering  An analytical procedure consisting of a sequence of
steps in which clustering is applied either to the whole data set or to the mem-
bers of one or more clusters obtained in the previous steps. In each step, a differ-
ent distance function or different parameter settings may be used. The purpose is
to progressively refine the clustering results and the understanding of the data by
the analyst. The procedure needs to be supported by visualization and interaction
techniques.
Quasi-continuous movement data (quasi-continuous trajectories)  Movement
data with fine temporal resolution allowing approximate reconstruction of the con-
tinuous paths of the moving objects by means of interpolation and/or map matching.
Relation event  An occurrence of a particular spatial or spatio-temporal rela-
tion between one or more movers and some elements of the spatio-temporal con-
text, such as other movers, static spatial objects, events.
Space  A continuous or discrete set consisting of locations. An important prop-
erty of space is the existence of distances between its elements. Space has no natu-
ral origin and no natural ordering between the elements. Locations in space are
identified using a spatial reference system, such as geographical coordinates.
Space–time cube (STC)  A unified representation of space and time as a three-
dimensional cube in which two dimensions represent space and one dimension
represents time. Spatio-temporal positions can be represented as points in an STC
and trajectories as three-dimensional lines.
Space–time prism  The set of all locations that can be reached by a moving
object given a maximum possible speed between a starting and an ending point in
space–time. A space–time prism can represent the uncertainty about the position
of a moving object between two known (measured) positions.
Spatial distribution  A distribution of values of one or more attributes over
space (i.e. different locations) in a given time unit. In particular, may include attri-
butes expressing the presence of various spatial objects (movers or spatial events)
in the locations and their properties, possibly, in an aggregated form.
Spatial event  An event having specific position in space, which is not neces-
sarily constant during the time of event's existence. An event may be considered as
spatial or not depending on the spatial scale of the analysis.
Spatial event data  Spatio-temporal data describing existential changes
occurring in space, that is, appearance and disappearance of spatial events. For
each event, the data specify its position in space and the time interval of its
existence.
Spatial object  An object having a particular position in space in any time
moment of its existence.
Spatial situation  Spatial positions and dynamic characteristics of spa-
tial objects existing in a given time unit and also an attribute characterizing a
time unit in terms of the existing objects and their spatial positions and dynamic
characteristics.
382 Glossary

Spatial time series  Spatio-temporal data describing changes of thematic prop-


erties (values of thematic attributes) of locations and spatial objects. The data
consist of time series of attribute values referring to different locations or spatial
objects.
Spatio-temporal context  See context.
Spatio-temporal data  Data describing changes occurring in space over time,
including existential changes (appearance and disappearance of spatial objects),
changes of spatial properties (spatial position, size, shape, and orientation), and
changes of thematic properties (values of thematic attributes) of locations and
spatial objects.
Spatio-temporal object  An object having specific positions in space and time,
that is, a spatial event or a moving object.
Spatio-temporal position  A position of a spatio-temporal object at a given
time unit represented by a tuple containing at least two components (time unit,
location), where location may be in 2D or 3D.
Specific movement behaviour  A particular combination of movement events
and/or relation events.
Static spatial object  A spatial object that exists during the whole time span
under study and whose spatial position does not change during this time.
Task, analysis task, analytical task  A piece of work aiming to find an answer
to some question.
Thematic attribute  An attribute whose values do not involve positions in
space or in time.
Time  A continuous or discrete linearly ordered set consisting of time instants
or time intervals, jointly called time units. Time units are identified using a tempo-
ral reference system, such as calendar. Besides the linear order, time units may be
arranged in cycles. In analyzing spatio-temporal phenomena, such as movement, it
may be necessary to account for the natural time cycles resulting from the earth’s
daily rotation and annual revolution, cycles of human activities, such as weekly
cycle, and/or for application-specific time cycles, such as the cycles of the circula-
tion of public transport on standard routes.
Time series  Data describing changes of values of one or more thematic attri-
butes over time. For a sequence of time units, the data specify the corresponding
attribute values.
Time unit  An element of time. The term may refer to time instants or time
intervals.
Trajectory  A part of the movement of an object that is of interest for a given
application and defined by a time interval that is included inside the lifespan of the
movement. The two extreme positions of the trajectory are called its start and end
positions.
Trajectory attributes  Attributes characterizing trajectories as wholes, for
example, length, duration, average speed.
Trajectory clustering  The process of clustering a set of trajectories into
homogeneous groups according to one or more properties characterizing them.
These properties can be spatial (e.g. start position, end position, length), temporal
Glossary 383

(e.g. start time, end time, duration), or dynamic (e.g. position, direction, and speed
in different time units).
Visual analytics  The science of supporting human understanding, analytical
reasoning, knowledge generation, problem solving, and decision making on the
basis of large and complex data. Visual analytics technology combines automated
analysis techniques with interactive visualizations, which create appropriate condi-
tions for human abstractive perception of relevant information, understanding, and
reasoning.
Index

A C
Abstraction. See Analytical abstraction Change map, 325
Abstraction of trajectories. See Simplification Clustering
of trajectories assignment of colours to clusters, 267
Aggregation, 8, 21 cluster-based classifier, 156
aggregation of movement data, 90, 138, clustering of spatial events, 12, 121, 211,
187 214
aggregation of spatial events, 197 clustering of spatial situations, 309
dynamic aggregation, 124 clustering of time series, 266, 284
Analysis task, 33, 68, 338 clustering of trajectories, 16, 41, 141
focus, 68, 69, 338 clustering of trajectory segments, 170
level, 69, 338 cluster prototype, 158
target, 68, 338 density-based clustering, 145
type, 69 partition-based clustering, 142
Analytical abstraction, 363 progressive clustering, 148, 154
Analytical reasoning, 361, 366 Connections between places. See Links
Animated map, 22 between places
Attribute, 35 Context, 28, 48, 172, 354
attributes expressing relations, 52, 82, 83, context element, 49, 52, 80, 201
182, 195, 198, 201 relations of movers to the context, 52, 83,
attributes of events, 199, 212 194, 354
attributes of links between places, 283, 287 Context data, 98, 195
attributes of locations, 39, 40, 47, 59
attributes of movers, 40
attributes of objects, 40 D
attributes of time units, 37, 39, 47, 59 Distance function, 145, 204
attributes of trajectories, 40, 80, 117, 142 Distance function for spatial events, 212
dynamic attribute, 39, 52, 80 Distance function for time series, 266
movement attribute, 40, 56, 81 Distance function for trajectories, 145, 152
positional attribute, 79, 81, 109, 118, 165, "route similarity + dynamics", 153, 162
170, 184, 195 "route similarity", 147
static attribute, 39 "start + end + intermediate checkpoints",
thematic attribute, 35, 39 153
"start + end + path length + duration",
152
B Division of movement data into trajectories, 2,
Behaviour. See Movement behaviour 15, 74, 120

G. Andrienko et al., Visual Analytics of Movement, 385


DOI: 10.1007/978-3-642-37583-5, © Springer-Verlag Berlin Heidelberg 2013
386 Index

E M
Encounter of movers, 173, 198, 222 Medoid, 158
types of encounters, 178 Modelling
Event, 37 dependency modelling, 288
attributes of events, 199 time series modelling, 269, 287
collective movement event, 210 Movement, 37, 354
composite spatial event, 43, 86, 211, 232 constrained movement, 288
elementary spatial event, 43, 86, 211 Movement analysis, 68, 352
extraction of movement events from trajec- Movement behaviour, 65, 361
tories, 82, 119, 195, 198 collective, 66, 364
extraction of spatial events from spatial general, 66, 363
situations, 319 group movement behaviour, 66, 170, 181,
extraction of spatial events from spatial 365
time series, 274 individual, 66, 364
individual movement event, 210 specific, 66, 362
influence of events on movement, 201 Movement data, 1, 41, 55
movement event, 43, 49, 51, 82, 119, 210 collection methods, 55
moving event, 37 episodic, 58, 93, 96, 198, 210
presence event, 45 forms of movement data, 46
relation event, 50, 51, 83, 262 quasi-continuous, 58, 93
spatial event, 11, 37, 43, 59, 210, 374 Movement track, 1, 55, 74
Event-based view of movement, 44, 210 Mover. See Moving object
Event data, 41, 59, 210 attributes of movers, 40
relations between movers, 173, 180
relations of movers, 51, 172
F relations of movers to the context, 83, 194
Flow, 8, 54, 95
flow magnitude, 54, 95, 283, 287, 308, 313
Flow map, 10, 17, 23, 26, 95, 104, 126, 133, O
139, 203, 284, 308, 313 Object, 33, 37
attributes of objects, 39
moving object, 37
G spatial object, 37
Generalization, 363 spatio-temporal object, 37, 49
spatial, 86, 138 temporal object. See Event
temporal, 86 Origin-destination matrix, 27, 111

I P
Interaction of spatial objects, 51, 173, 205 Place of interest, 92, 254. See also Location
Interpolation, 57, 73, 175 extraction of places of interest from move-
ment data, 257–259
links. See Links between places
L personal places of interest, 259, 279, 291
Links between places, 94, 125, 283, 288 Presence dynamics, 39
attributes, 283, 287 Progressive clustering, 148, 204
Location, 36, 53, 59, 381
attributes of locations, 39, 40, 47
connectedness between locations, 53 R
order relations between locations, 291, 296 Relation, 49
relations of locations, 53, 283 attributes expressing relations, 52, 82, 83,
temporal relations between locations, 54, 182, 195, 198, 201
291 attributes of relations, 54
Index 387

change between time units, 54 Spatio-temporal data, 41, 97


connectedness between locations, 53 Spatio-temporal trend, 55, 327, 357
equivalence between time units, 54 Stop, 4, 10, 34, 44, 84
flow relation between locations, 54
relation occurrence, 50, 83, 173, 195, 198
relations between movers, 173, 180 T
relations of locations, 53, 283 Task. See Analysis task
relations of movement events, 51, 199 Temporal bar chart, 109, 118, 171, 184
relations of movers, 51, 172 Temporal window, 52, 174, 198
relations of movers to the context, 52, 83, Text cloud display, 236, 242, 299, 323
194, 354 Time, 33, 36
relations of time units, 53, 325 temporal cycles, 37, 47
spatial relation, 49, 50, 53 Time geography, 34
spatial relations between time units, 54 Time graph, 5, 261
spatio-temporal relation, 49, 51 quantile graph, 262, 265
temporal relation, 49, 50, 53 temporal histogram, 263, 265
temporal relations between locations, 53 Time series, 42, 261
Re-sampling, 73 clustering of time series, 266, 284
Route, 15, 144, 145 event extraction from time series, 274
modelling, 269, 287
peak detection in time series, 275
S transformations of time series, 263
Sequence mining, 296 trend removal, 263
Simplification of trajectories, 88, 120 Time unit, 37, 53, 59
attribute-based, 89 attributes of time units, 37, 39, 47
density-based, 89 change between time units, 54, 325
event-based, 89 equivalence between time units, 54
geometric, 88 relations of time units, 53, 325
place-based, 89 spatial relations between time units, 55
Sinuosity, 80 Tool categories, 69
Space, 33, 35 analytical tools, 69
spatial reference system, 35, 36, 76 transformational tools, 69, 73
Space tessellation, 92, 125, 133, 138, 225 Tortuosity, 80
division into cells of variable sizes, 255 Trajectory, 2, 38
Space transformation. See Transformation of attributes of trajectories, 40, 80, 117
spatial references central trajectory of a group, 181
Space–time cube, 4, 12, 18, 34, 107, 121, 154, characteristic points of trajectories, 91, 133
172, 184, 200, 220, 222, 244, 322 full trajectory. See Movement track
Space–time path, 34 Trajectory reconstruction, 57
Space–time prism, 34 Trajectory wall display, 166, 172, 184
Spatial distribution, 42, 95, 308 Transformation of spatial references, 76
flow distribution, 95, 308 group space, 77, 187
presence distribution, 95, 308 Transformation of time references, 19, 75, 81,
Spatial event data, 41, 59, 97, 195, 198 154
Spatial neighbourhood, 319 Two-dimensional histogram, 12, 195
Spatial situation, 28, 39, 308
clustering of spatial situations, 309
representative spatial situation for a cluster, V
309, 313 Visit, 45
Spatial time series, 42, 59, 95, 97, 107 Visual analytics, 29, 363, 366
Spatial window, 53, 174, 198 Voronoi tessellation, Voronoi polygons. See
Spatio-temporal cluster of events, 211, 232 Space tessellation
Spatio-temporal context. See Context

Вам также может понравиться