Академический Документы
Профессиональный Документы
Культура Документы
LICENSE
FOREWORD
Citadel on the Move was an EC funded project under the CIP (ICT-PSP) programme 2012-2014,
with a simple objective: to make Open Data an achievable reality for every city in Europe. By
working with the four pilot cities of Ghent (BE), Issy-les-Moulineaux (FR), Athens (EL), and
Manchester (UK), Citadel developed an easy to use platform that makes it possible for all
governments, especially the small ones that often get left behind, to Open Data and unlock
smart city innovation. The tools and the datasets and apps generated (over 500 apps in the
projects lifetime) are all part of a central concept the Open Data Commons that extends the
scope of Open Data from specific city portals to all actors in a Smart City, promoting the active
engagement of citizens and local businesses as well as city departments and agencies to
contribute their own data and build a common data resource for the whole city.
The Open Data Commons is a concept that was developed by Alfamicro, one of the key partners
of the Citadel consortium, and reported in a fragmented way across a range of project
documents. To make the Open Data Commons concept less obscure and popularize it to a
broader international audience, we have assembled in this book some of the key contents
developed all along the project. This book therefore briefly presents the Citadel project and the
stakeholder based approach applied to the development and governance of the Open Data
Commons in the four pilot cities. It then describes the Open Data Commons and how it was
implemented in the course of the Citadel project, with a special focus on the semantic
dimension and a specific chapter on the issue of privacy. Finally, the policy implications of the
Open Data Commons and the future of Open Data are briefly explored.
This work has of course been made possible by our interaction with all of the partners of the
Citadel consortium, under the leadership of Geert Mareels of CORVE (BE) supported by Julia
Glidden and the team at the 21c Consultancy of London. The pilot driven approach was
successfully carried out by the City of Ghent, Issy Media of Issy-les-Moulineaux, MDDC of
Manchester, and DAEM of Athens, and supported by the evaluation work of iMinds (BE). The
technical team included Intrasoft and ATC of Greece as well as Derby University (UK), ITEMS
(FR), and V-ICT-OR (BE). Project dissemination was entrusted to the Euractiv Foundation and
coordination to IS-practice, both in Brussels. Part of the work described here was also made
possible by additional funding from the FI-WARE projects Lisbon Pilot, whose technical team
included FullIT, IPN, and DRI, all from Portugal.
Our thanks go out to all of the dedicated people with these organisations whom we have had
the pleasure to work with over the last years, as well as the citizens and developers who have all
engaged with us in the design and development of the Citadel Platform. Finally, my thanks go to
the Alfamicro team who have written and compiled this book, as well as to Leonardo Alberto dal
Zovo, who played the essential role of developing the main tools of the Open Data Commons.
lvaro Duarte de Oliveira, President
Alfamicro Lda
CONTENTS
Foreword .......................................................................................................................... 3
CONTENTS ............................................................................................................................ 5
Index of Figures ................................................................................................................. 8
Index of Tables .................................................................................................................. 9
Definitions used in this Book............................................................................................ 11
The Citadel On the Move project ...................................................................................... 15
The Citadel approach to Open Data.................................................................................. 15
Defining the Open Data Commons ...................................................................................... 21
Stakeholder dynamics for Open Data ............................................................................... 21
Experimentation in Pilot Cities ......................................................................................... 28
Roles in the ODGG ................................................................................................................. 29
Reaching ODGG Objectives .................................................................................................... 30
The Open Data Commons at Work ...................................................................................... 33
Operationalisation through Experimentation ................................................................... 33
First issue (2012) .................................................................................................................... 34
Second issue (2014) ............................................................................................................... 36
The Semantic Dimension of the ODC ................................................................................ 38
Standards issues in the first ODC concept ............................................................................. 38
The Emergence of the Converter-AGT model........................................................................ 39
The central issue of semantics ............................................................................................... 41
Semantic convergence in the pilot cities ............................................................................... 42
The ODC as a Semantic Framework ....................................................................................... 43
Privacy and the Open Data Commons.................................................................................. 47
General Framework ......................................................................................................... 47
The Privacy Impact Assessment Framework ......................................................................... 49
Privacy types .......................................................................................................................... 50
Privacy at the Community level........................................................................................ 53
Periodic surveys of the Pilot Cities......................................................................................... 53
Analysis of Outcomes............................................................................................................. 56
Towards a Community PIA ..................................................................................................... 57
Privacy at the App Level................................................................................................... 57
Mapping of Citadel Apps........................................................................................................ 57
Analysis of Implications ......................................................................................................... 59
Proposal for an App PIA Framework ...................................................................................... 60
Privacy at the Data level .................................................................................................. 63
Privacy in the Open Data Commons ...................................................................................... 63
5
INDEX OF FIGURES
Figure 1. Open Data Value Chain .................................................................................................. 21
Figure 2. The Citadel Vision ........................................................................................................... 22
Figure 3. Typologies of innovation ................................................................................................ 23
Figure 4. Mapping of stakeholder domains .................................................................................. 23
Figure 5. Mapping of stakeholder transactions ............................................................................ 24
Figure 6. Citadel additional stakeholder transactions................................................................... 24
Figure 7. Citadel integrated ecosystem ......................................................................................... 25
Figure 8. Key areas of stakeholder interaction ............................................................................. 26
Figure 9. Outcome of stakeholder interaction .............................................................................. 26
Figure 10. Open Data activity contribution to Citadel objectives ................................................. 27
Figure 11. Pilots contributions to Citadel objectives.................................................................... 28
Figure 12. Issues for specification of the Open Data Commons ................................................... 34
Figure 13. The range of functions in the ODC ............................................................................... 35
Figure 14. The vision for the Citadel ODC ..................................................................................... 36
Figure 15. The ODC as a semantic model ...................................................................................... 41
Figure 16. The Semantic Core of the ODC ..................................................................................... 44
Figure 17. The Open Semantic Ecosystem .................................................................................... 45
Figure 18. The ODC in the 'Real World' ......................................................................................... 46
Figure 19: Stepwise PIA process (source: [20]) ............................................................................. 50
Figure 20: Typologies of privacy in Citadel .................................................................................... 51
Figure 21: Stakeholders role in the definition of privacy policies (Citadel members) .................. 54
Figure 22: Stakeholders role in the definition of privacy policies (Citadel non-members)........... 55
Figure 23: Conditions and purposes of MoU definition ................................................................ 79
Figure 24. A typical relational database structure ........................................................................ 87
Figure 25. A typical GTFS folder unzipped..................................................................................... 88
Figure 26. A typical CKAN Data Store ............................................................................................ 89
Figure 27. AGT Apps by Month ..................................................................................................... 90
Figure 28. The basic RDF syntax .................................................................................................... 92
Figure 29. The LOD schema for the statue of Einstein .................................................................. 92
Figure 30. Citadel JSON Data Model ........................................................................................... 113
INDEX OF TABLES
Table 1. Comparison of Pilot ODGGs ....................................................................................... 29
Table 2. Governance Roles in Pilot ODGGs .............................................................................. 29
Table 3. ODGG Objectives in Pilot Cities .................................................................................. 31
Table 4. Summary of the three levels of the Citadel PIA framework....................................... 52
Table 5. Mapping of Citadel Application Templates ................................................................ 58
Table 6. Taxonomy of Data/Application Pairings in Citadel ..................................................... 61
Table 7. Proposed data privacy classification scheme ............................................................. 65
Table 8. Proposal for a Data Licensing Mechanism (based on the CC scheme)....................... 67
Table 9. Proposal for a Data PIA Framework ........................................................................... 68
Table 10. Open Data Ecosystem capability matrix ................................................................... 73
Table 11. A CMM for LOD (example) ....................................................................................... 76
Table 12. Ecosystem role definition and potential MoU contribution .................................... 77
Table 13. Citadel Charter Approaches ..................................................................................... 83
Table 14. Citadel Common File Formats Mapping Grid ......................................................... 111
10
12
control over own information and, for organizations, to acquire a sustainable competitive
advantage by taking a positive sum, not a zero-sum, approach to privacy protection1.
SEMANTICS
Semantics is technically the study of meaning, but in the area of data management it takes on a
more specific definition as types of data structures specifically designed to represent
information content. Semantics can thus refer, for instance to the meaning of column headings
in an Excel table and whether to expect the same information under the headings name and
title.
CITADEL HUB
Available online at http://www.citadelonthemove.eu/en-us/Thehub.aspx - it is a collection of
Open Data, mobile application templates, user extensions and discussions about these. It was
setup in the early stages of the Citadel project, before migrating its contents to Github
(https://github.com/citadel-eu).
OPEN DATA GOVERNANCE GROUPS
Established since the early project stages, in compliance with the Living Lab approach, they
were informal groups consisting of the key stakeholders from the Open Data Community in each
of the pilot settings.
CITADEL CHARTER
Better referred to as Citadel Open Data Charter, it was originally conceived of as a formal
protocol (MoU) to be signed in preparation and accompaniment of the governance of opening
up processes. What has in fact emerged as a common need for the cities involved in Citadel is to
share a common vision and principles, so that the final version of the Charter has taken the
form of a manifesto, in continuation of the Citadel Statement of 2010 on which the project is
originally based.
13
14
The Citadel on the Move project is for many aspects carrying that process one step further by
implementing these principles in practice, building Open Data-based mobile application
templates for pilot experimentation in four pilot cities across Europe Ghent, Issy-lesMoulineaux, Manchester, and Athens and since extending engagement to over 120 cities in 5
continents. The results of the project allow all cities, even the smaller villages, to offer public
services on the mobile phones of their citizens and visitors at a low cost. All they need to do is
to publish their data using the Citadel tools and formats and the mobile apps built according to
the same standards will be usable in their municipality.
Citadel thus makes it possible for all municipalities to offer
Data to be used by Mobile Apps developed by citizens or companies (local or from other
cities)
Mobile Apps to their citizens and visitors.
The Mobile Apps are the most visible and concrete services based on open data to make life
easier for people. But of course the cities will have to do part of the work themselves. They have
to publish their data in a way they can be picked up by the Applications. Sounds easy - but they
will have to overcome the political, administrative and legal constraints, which still slow down
the Open Data movement.
In contrast, policy makers can expect that there will be a growing demand from their citizens to
be able to use the same app in their city as they have experienced in a neighbouring city. Like
the use of Internet, which slowly but surely found its way in the Public sector twenty years ago,
Citadel helps the same modernization of the Public services in the use of Open data on Mobile
Applications.
In the three years of development and experimentation of the Citadel concept, three key
principles have emerged, which we can fix as strategic guidelines for Smart Cities:
15
http://humansmartcities.eu/
16
17
Rather than proposing single standards for a given area, Citadel thus prefers to focus on building
literacy in standardization processes, both for both todays and tomorrows emergent needs.
Indeed, this means learning to clearly identify the area of standardization, search for on-going
activities and standards proposals, search for the richness of the toolkit ecosystem built around
the available options, and evaluate the best strategy in relation to the current landscape.
In short:
18
of the public sector is elevated from mere data provider to the stewardship of the collective
interest.
In short:
Citadel has defined a common space in the public domain as key to uptake of Open Data
The Open Data Commons as the on-going collection of shared tools and resources
allows to publish and access datasets transparently
Promoting the emergence of standards and sharing standards of practice
Based on a partnership of the data and development communities
Governance principles are required to define ODC structure and nature
Role of the City Government in guaranteeing openness and transparency of governance
19
20
Enabling technologies such as the Internet and open source software applications are
supporting and enhancing the main value-creating functions;
Much of the currently expanding re-use activity only started once low-cost ICT applications
and networks became available;
A positive economic value is actually created out of Open Data / Public Sector Information
reuse, according to a number of relevant business models [8];
Recent trends on collaborative data and service production between governments and
citizens [15] do not add significant feedback loops to the workflow schematized in the
following Figure:
In the above representation, four main actors, or stakeholder categories, can be identified, in
close association with well specific tasks:
Policy Makers, being in charge of the high-level direction and regulation of the whole
process, and with specific respect to Data Providers;
Data Providers, usually, though not always, public bodies or agencies (such as public utility
companies, statistical offices, chambers of commerce etc.), being responsible for the
creation (setup, organization, structuration) of the open datasets, and sometimes also of
their adaptation and specialisation to the needs of the Application Developers;
21
Application Developers, usually ICT companies, sometimes under the control of public
bodies, otherwise acting on the free market, with the mission of transforming the datasets
available into human readable forms either products, or services, or both;
Business/Citizen Communities, including not-for-profit entities and NGOs, who are
ultimately beneficiaries of the transformation, generation and utilization of public datasets
according to their respective (business / non business) purposes.
Activities beyond raw data creation, collection and aggregation, which can be relevant to value
creation include, for instance: data processing, editing and packaging, marketing and delivery.
More recently, they also comprised the development of APIs, mash-ups and other forms of
user friendly if not user generated content. However, as the following picture shows, the
essence of Citadel vision is to complicate the previous representation of the value chain by
adding three forms of interaction between the four stakeholder categories introduced before:
a) Data co-production, deriving from the Business/Citizen Communities themselves, as
parallel and additional sources with respect to Data Providers;
b) Application co-design, again reflecting the spirit of freedom and initiative that
characterizes most end user communities;
c) And policy co-creation, as joint result of the feedback searched for by the smarter
Policy Makers and received back from all of the remaining stakeholder categories, after
a complex process of Living Lab interaction that is the goal of Citadel development
activities to achieve.
As final outcome of this set of feedback loops and interrelations, two main goals are to (should)
be achieved: intelligent policy learning, from the perspective of workflow directors and
regulators; and the creation of (additional) value from the disclosure of Open Data and the reuse of Public Sector Information, that what could be reasonably guaranteed using the
conventional, one-way logic depicted in Figure 1 above.
22
The way this outcome becomes feasible can be described as follows. In Figure 3, we add
another relevant analytical dimension to our vision, namely the distinction between
technological and social (including also institutional) innovation. Among the many definitions of
the latter, we would like to adopt the following: innovative solutions and new forms of
organisation and interactions to tackle social issues.
By the combination of the value chain tasks depicted in Figure 1 with the typologies of
innovation introduced above, we can easily locate the four stakeholder groups as per the
following diagram:
Here, the corresponding value transactions - using the jargon popularized by the Value Network
Analysis paradigm [16] can be depicted as in the Figure on the next page:
23
One can notice the addition of the Impact and Requirements function from the
Business/Citizen Communities to the Policy Makers, in such a way that the linear workflow
outlined in Figure 1 may hold an iterative feature permanently added to it.
However, the contribution of the Citadel project to refining the above vision is more extended
than what has been discussed by now. In particular, the operational objectives set out in our
work on Open Data are introducing de facto a symmetrical iteration, going counterclockwise as
described by the following scheme:
In this scenario, Policy Makers act as prime movers with respect to the Business/Citizen
Communities, in launching and promoting the constitution of the ODGGs in the respective Cities
(by now, those that are formal partners of the Citadel consortium; in the future, those that will
adhere to the proposed scheme and play the role of supporting or affiliated partners). This
ensures the definition of the scope, limitations and conditions under which the whole
experiment takes place including, but no less important, the privacy, confidentiality and
security aspects related to the procedures of Open Data disclosure and Public Sector
Information dissemination.
Within this overall framework, it is desired, and somehow expected, that the local
Business/Citizen Communities, adequately stimulated and supported, may start defining their
range of expectations, desires, and purposes, with respect to the specific utilization examples of
the various applications developed, or to be designed and worked out with the integration of
the public datasets available or to be made available. This backward process, which also
24
includes the generation of own datasets, whereby citizens and/or businesses themselves act as
complementary Data Sources with respect to the Public Sector, should positively influence the
strategic behaviour of the Application Developers, who could stay more focused on the
developments that hold the maximum level of utility, usability and social acceptance, instead of
wasting precious resources in a tedious and never ending process of ex post validation for the
APIs or other ICT applications established meanwhile.
As a by-product of this virtuous interaction between prospective end users and solution
providers, a new range of access and acquisition protocols should also be foreseen, between
the Application Developers and the Public Sector Data Providers. The latter should make
reference to the Policy Makers again, for revised and revamped guidelines concerning pricing
and availability of datasets, in relation to the priorities expressed or signalled by the ultimate
beneficiaries.
Although the proposed representation may look oversimplified (as it does not include, for
instance, the cases of user generated or private sector owned datasets, nor it considers
application developers as capable of achieving social innovation), most of its heuristic value is
given by the juxtaposition of Figure 5 to Figure 6 into a single, integrated ecosystem, as shown
in the picture below:
This exercise is helpful, in that it identifies four main areas of interaction, with the
corresponding feedback loops:
25
As a result of those interactions, the goals of policy learning and value creation (as per Figure 2
above) should ultimately be achieved.
The overarching objective of Citadel is to grow and nurture such an ecosystem providing tools,
methodologies, cases and exploitation opportunities.
In the next two (and final) pictures, we identify the contribution of Citadel Open Data and
development activities, respectively, to the achievement of such objective, through a number of
instrumental and operational reports.
26
Beside the definition of management rules for the ODGGs, Citadel also dealt with the creation
of Open Data Charters concerning the use of public datasets. Later in the project, Privacy Impact
Assessments were defined to identify the risks for personal/sensitive data related to the
introduction of a culture of openness and transparency in Public Sector Information and Data
handling. In parallel to this effort, the semantic dimension of dataset production and usage was
explored, in order to define a common Semantic Framework. Finally, as a collective space on the
Citadel project website, an Open Data Commons Repository was conceived, in its first instance
as a collection of links to available datasets together with a variety of open source tools
providing for adaptation, refinement, and access to public datasets and application resources.
Most of the above achievements, including the technical developments related to each Citadel
pilot, were ensured by the joint contribution of technical partners and pilot cities. As far as the
latter are concerned, the following diagram summarizes their contribution to the Citadel
objectives, namely:
The intention is not to describe each of these steps in depth, but rather to highlight the close
interconnections between the Open Data Commons Repository and the Template Applications,
both lying at the crucial point of convergence between Business/Citizen Communities (as the
prime receptors of the commercial/non commercial value created) and Application Developers
(including the projects technical partners, as well as third party organizations, including citizens
and NGOs acting under the Web 2.0 / FLOSS logic on the ICT market).
27
For Ghent, Citadel coincided with the launch of a double strategy for Open Data and
constitution of the local Living Lab. Citadel thus helped frame and guide this process as
it rapidly grew; the ODGG constitutes somewhat of a lead-user forum. This explains the
significant number of meetings held throughout the project.
In Issy, Citadel was aligned with a new strategy definition process, so the ODGG
included the different stakeholders including both different city government
responsibles and application developers to define together a common strategy. This
explains the larger composition of the group and the slower process of data publishing.
For Manchester, Citadel reinforced long-standing Smart City and Living Lab strategies.
There already existed a strong Open Data community in Manchester, so the ODGG
mainly aligns those activities with the work in Citadel.
In Athens, Citadel helped to define a new Open Data strategy, which, due to the mere
size and complexity of the city, needed to raise awareness among many institutional
departments as well as citizen stakeholder groups. The ODGG thus worked as the
strategic core group guiding this process, and tended to focus on the required actions
for pilot start-up to deliver concrete results.
The final reports from the pilot cities, following two years of experimentation of the Citadel
tools and platform, confirm the different approaches taken in terms of governance strategies,
while at the same time delivering successful outcomes across the board, as shown in the
following table:
28
City
Ghent
ODGG
members
11
Events
24
Total
participants
>575
Issy
34
195
Manchester
10
>250
Athens
32
Governance style
Tightly technical ODGG with active
engagement of developer and enduser communities in numerous events.
Broad representation in the ODGG,
with more selective and structured
events.
Continuity with on-going Open Data
strategy, increasing links with
community.
ODGG composed of key political
actors, coupled with direct
engagement of active citizen groups.
Against this background, we can note the roles actually played by the different actors in the
ODGGs of the pilot cities as the project evolved, as shown in the following table:
29
Role
Mayor, City
Government
City ICT
Department
Ghent
A clear Open Data
strategy was
already in place
with political
support.
The ICT
department
coordinated the
pilot throughout.
Public and
Private Data
providers
Software
companies
Ghent tended to
work with SMEs
and citizen
developers.
Citizen
developers
Played an active
role in
Hackathons and
app development.
Strong role for
Open Knowledge
Foundation, also
for the cultural
sector.
Engaged through
Living Lab
activities,
especially in the
open co-design
events.
User
communities
Citizens and
visitors
Issy
The Open Data
strategy was
launched and
defined in the
course of Citadel.
Issy Media already
had a strong
mandate to
promote
innovation.
Issy engaged many
city offices (e.g.
tourism) but also
neighbouring
municipalities and
multi-level
stakeholders
(agglomerate and
regional levels)
Manchester
A clear Open Data
strategy was
already in place
with political
support.
Manchester MDDA
with a strong
mandate, acting as
pilot leader.
Athens
Attaining strong
political support was a
key objective for
Citadel in Athens.
Dealing
predominantly
with municipal
data holders.
Good
representation of
development
community.
Engaged through
testing, but also
workshops and
conferences.
Multiple roles
identified and
engaged for citizen
developers.
User communities
mainly engaged
through pilot
activities.
User communities
mainly engaged
through pilot
activities.
Citizen community
NGOs played a driving
role in the demanddriven strategy.
The tourism industry
is a key concern for
Athens.
30
Open up data: the first and foremost objective of the ODGGs was to spark off processes
for opening up datasets.
Engage with the community: the second objective was to actively involve data owners,
the development community, and local citizens and businesses in Open Data
Define a strategy: the final objective for the ODGGs was to enable the community to
identify the best way forward to maximize the value of Open Data for their city.
Table 3. ODGG Objectives in Pilot Cities
City
Ghent
Issy
Manchester
Athens
Open up
In the context of an Open
Data policy already in place,
the Citadel ODGG helped
reinforce the link between
the city and the open
development community.
Issy used the Citadel ODGG
to launch its Open Data
policy, going from nothing
to a significant number of
opened datasets.
Citadel helped reinforce an
already existing Open Data
strategy and extend the
user base.
In a situation of political
hurdles and severe
austerity, the Athens ODGG
approach has been
successful in gaining strong
political support for Open
Data.
Engage
Ghents engagement with
the community was
reinforced through the
Citadel co-design events:
Ghent pioneered the
Apps4Dummies format.
Issy Medias existing
structures and activity
frameworks provided the
setting through which to
engage citizens and local
businesses.
The Citadel tools helped
bring new actors into the
picture with less technical
skills.
The Athens ODGG engaged
directly with key
government data holders,
while at the same time codesigning application
scenarios with citizens and
community groups to gain
bottom-up consensus.
Define a strategy
Through the work of the
ODGG, Ghent shifted from
a data-push to a demandpull strategy, particularly as
regards the cultural sector.
The ODGG is helping Issy
carry out an original multilevel strategy involving
nearby municipalities,
coordinating with national
and regional portals.
The Manchester Open Data
strategy is extended and
reinforced by the
availability of the tools.
Athens now has a clear,
Citadel-driven Open Data
strategy that will be
sustained by the Athens
Living Lab currently being
established.
31
32
At this point it was useful to look more closely at the functions that could be included in the
ODC, mainly by looking at the different tools and approaches that currently exist for supplying
open data application with an external dataset. The main approaches can involve:
34
Direct access, as mentioned above: this can occur in the case that an external dataset
offers data exactly as the application expects it to be formatted and structured;
Plugs or translators, that have the function of translating file formats or re-mapping
data structures to make data fit applications (we can consider some XML translators in
this category).
Data dumps, mirrored databases constructed for a variety of purposes such as: a)
storing translated datasets as per the above; b) where a dynamically updated
database (eg. Meteo) is copied at regular intervals so that external applications avoid
overloading the primary system with queries or c) for other reasons such as security.
One or more APIs can be developed either from the data side (as in the CitySDK project)
or from the application side (as with Pachube or Foursquare) providing interface
functionalities.
Various combinations of the above.
The interesting fact for Citadel is that the above tools and devices are part of an ever-evolving
ecosystem in which application developers, data owners, and third parties interact. Indeed, the
indirectly are the key drivers of standardisation processes in Open Data, since the emergence of
a standard such as GTFS (General Transit Feed Specification) is generally accompanied by the
production of APIs, translators etc. to help fit that standard, even in the presence of emergent
enhancements to that standard such as real-time-GTFS.
The second noteworthy fact is that most of these tools and devices are freely available in the
public domain if not Open Source. The only exception to this are APIs that are paid for as part of
the business model for applications such as Foursquare, but this does not mean that the ODC
cannot list the API and facilitate its adoption, leaving it up to the user to decide whether or not
to adopt a commercial (generally proprietary) format.
This opens an interesting scenario for the ODC within an Open Data Smart City strategy. The
ODC can become the space which manages an ever-evolving ecosystem that as a collection of
tools can be said to define the public space of a citys information capital. On the one hand, it
makes it possible for the citys data to be seen by a broad range of applications; on the other,
35
it allows applications easy access to the citys information capital. Use of the ODC would occur
not by any constriction but by the convenience it offers to data holders and application
developers alike.
In order to maintain this role, ODC cities (through the Open Data Governance Group) would
negotiate with application developers to ensure that the components they develop that can be
said to be of public utility data access tools that could be re-used by other applications be
donated to the ODC and remain in the public domain, in exchange for access to the citys data
through the ODC. The community of developers that participate in such an endeavour would
thus have an interest in collaborating to ensure that the different components developed work
together smoothly where appropriate as well as promoting adherence to emergent standards.
This vision for the ODC can be illustrated as follows:
The above figure also illustrates an additional added value that can be provided by the ODC in
line with this scenario. Since the ODC will be managing the interfacing between application
queries, it can maintain a record of queries coming from the application side as well as records
of which applications access a given dataset and for what purposes. This allows us to imagine
the introduction of the concept of bi-directional traceability of Open Data, which opens up
interesting possibilities for the management of privacy and security issues. In addition, analysis
of the queries and transactions over a given period of time could allow for the identification of
semantic patterns, thus allowing for a bottom-up definition of emergent semantics that could
then be fed back into the definition of appropriate data structures.
template. Indeed, the City of Ghent first built a simple tool for this purpose, translating parking
information to the required JSON format. The request therefore arose for a general Converter
that could transform any CSV or Excel file into the required JSON format for use in Citadel.
This idea was taken up as the first step in actual realisation of the Open Data Commons concept
that had been defined in the first year of the project. Indeed, the Converter as described would
be the first of a series of tools bridging the gap between datasets (as they currently stand in 95%
of public administrations, ie. Excel files) and applications built using the Citadel Templates.
By the time the request was formalized and the first UML specifications of possible Converter
workflows had been defined, the pilot cities were waiting anxiously for the Converter in order to
finally begin building apps. An unusual development plan was thus defined for the Converter,
following three main stages in an open sequence that allowed to test the concept as early as
possible and then proceed on the basis of user feedback:
A first prototype was realised in less than a month, using php for a server-based
converter. This version only worked with CSV and mapped columns in the original
spreadsheet directly onto the data schema of the JSON format. This version was
released in December 2013 and was an immediate success with the pilot cities.
A second prototype was built using Java, as an off-line tool. This was meant to provide
more stable features and possibly be used for batch processing for files with the same
data structure and/or with constant updating. This version also separated a first phase
of semantic mapping (pairing source column headings with standard field names) with
that of the export schema (matching with the actual fields of the template data model,
and in addition adding necessary metadata such as language, licensing, etc.). This
version was less successful due in part to the large size of the file to download but
mainly because pilot cities were preferring the simpler though less sophisticated
conversion of the php version. This version was released mid January 2014 but received
little feedback.
Shortly thereafter the final version was released as an on-line Java tool encapsulated in
Liferay. The basic functionalities were essentially the same, and it was this version that
has been gradually improved through interaction with pilot users in the Living Lab
settings. The first step was to add help texts along the way, as well as feedback on
possible errors in the mapping to the export schema. This version was released in time
for demonstration and testing at the Data Days conference in Ghent in February 2014.
Since then, further refinements of the Java code, together with significant upgrades of the
server features, have been carried out with the objective of improving performance, and in fact
the response times have been notably reduced (another of the reasons why the pilot cities
initially preferred the php version). Following initial user testing, the Converter tool was then
integrated into the Citadel platform. This involved stripping away the user registration of the
Liferay environment in order to allow a smooth passage from the Citadel Hub, within which the
Converter is inserted as a simple i-frame. Other enhancements to the Converter have been
carried out in a dialogue with end-users, and are reported in the following section.
As a final note, it is interesting to see that one of the hypotheses of the development plan ie.
that outside developers would prefer to work with the php version has been validated by the
37
recent adaptation to geoJSON of the Converter and other activity on Github. Thus both the php
and the Java versions continue to co-exist, the first as a more open, technical, and experimental
version and the second as a more stable, user-friendly version useful for the front end of the
Citadel platform.
The Citadel Converters operational maintenance has mostly been a question of adapting to
continuous user requests for enhancement. The main problem is that the Converter significantly
raises expectations promising to convert just about anything into an app while in fact there
are many problems to address, mostly related to problems or inconsistencies with the original
dataset. Addressing these issues has involved a combination of: technical improvements,
accompanying information, and human support.
The technical improvements have mainly been carried out on two fronts: geographical
coordinates and dataset publishing. As for geographical coordinates, the Citadel JSON format
foresees latitude and longitude in one field separated by a comma and a space. Other common
formats (ie. with a comma only as in Google) returned an error message. Work was therefore
carried out to automatically recognize different formats and adjust them where necessary. The
second question is related to the desire to directly save the converted file and publish the
metadata on the Citadel Hub. This issue introduced the option of saving converted files to any
CKAN server, though also required an API through which to write to the Citadel Index.
Another aspect is related to providing user information. This has occurred through: presentation
of the Converter so as to lower expectations (raising awareness of the difficulties involved),
explanatory trouble-shooting pages on the Citadel website, improvements in the help texts
accompanying the different phases of conversion, and improvements in the error messages to
make them more understandable. A final aspect, human interaction, has led to a series of
actions that are not within the scope of this book, except perhaps for the preparation of a series
of help sheets and template Excel files used as support tools for the Apps4Dummies
workshops3.
These were awareness raising events organised by the Citadel consortium in several locations across Europe.
38
define a data model for practical purposes, then it didnt necessarily have to become a
standards proposal since other data models could perfectly well co-exist with it in the ODC
space, together with the transformation tools needed to convert towards them.
The ODC model was thus presented as a vision, without the specific intention of implementing it
in practice. The goal was rather to use the ODC as a framework capable of guiding the thinking
and actions of the pilot cities. The main objective at this early stage was to see whether in
practice such an autonomous space between data and applications did exist and, if so, to
identify where the borders or interfaces above and below this space were and what defined
them.
What did emerge in the first cycle of pilot testing was the emptiness of this common space.
Feedback from the pilots noted the significant gap between city datasets on the one hand and
the application templates on the other, which require data to be in a specific JSON format.
Normally, this gap would be bridged by a specific tool such as an API, but the whole idea of the
ODC is to introduce a different concept that, rather than bridge this gap, fills it with elements
that are open and re-useable4.
In addition, APIs generally work only with on-line relational database services, while the great
majority of the datasets of the pilot cities consist of Excel files. The gap between data and
applications was thus a significant one, and cities found themselves with few applications with
which to use their data (giving them little motivation to open more data), while developers had
little data ready to use with the application templates (giving them little motivation to go
through the complicated process of installing the templates client-server configurations). To
begin to overcome this problem, the Ghent pilot devised a simple tool to convert data from one
of the citys services for parking data to the JSON format required for the parking template. This
was heralded as the first instance of the ODC concept at work, with the spontaneous emergence
of a conversion tool to begin the process, but it actually laid the ground for the further
development of the ODC itself as a more complex system. In the process, what initially appeared
as a question of technical formats (the use of JSON), ultimately emerged as a question of
semantics.
the Application Generator Tool (AGT) adopts a generic version of the Citadel data format
(mostly based on the POI templates data model) to generate apps that read one or
APIs in fact are in general not conceived as belonging in the common space of the ODC. They are either designed as
an accessory to an application, so that a data service needs to write specific code to feed data to it (example Google
Maps or Xively) or they are written for a specific data service, so the application has to have special code to be able to
use each data services API. The idea of the CitySDK project (http://www.citysdk.eu/) is to standardize the APIs
associated with common types of data services, but only works with data services and not, for example, with the
spreadsheet type static files that make up most of available public sector information.
39
more appropriately structured JSON files and then visualize the POIs in a list or map
visualization.
The Converter, which starts from a row-based Excel or CSV dataset as input (like the ones
most cities had produced, notably Issy-les-Moulineaux), then carries out a mapping of
the column headings to the generic Citadel data model of the AGT (i.e. mapping Name
to Title) and finally saves the output as a Citadel compliant JSON file.
This required an important step to be taken as regards the architecture of the Citadel templates.
In the first round of pilot testing, apps were created by selecting the appropriate template,
encapsulating the necessary datasets (after downloading the files), and installing the template
software (with the data inside it) on the server side with the visualization part on the client side.
This procedure, which is normal practice for mobile applications, essentially creates a closed
system, even though the data was originally downloaded from an open portal and structured
according to a common data model.
In order for the AGT to be able to generate an app quickly, it was necessary to separate the data
from the application or application template that reads it, thus externalizing the dataset. In this
way, the app resides entirely on the client side (i.e. in a smartphone), the data resides as an
autonomous file on the Internet (i.e. on the Citadel Hub), and the app reads the data in real time
when it needs it. This is more similar to the way an API reads data from an on-line web service,
essentially by connecting to the web service, asking the right questions, and knowing how to
expect the data to be returned and subsequently adapted to the needs of the application. The
difference, however, is that the Citadel JSON file is live5 but already in exactly the format the
application expects to see it: all the application needs is the URL to read the information directly
from the external server, with no further need for an API.
From a functional standpoint, this means that for every original dataset published by a given
city, there is the need to also store a JSON version of the same file so that the AGT can read the
information from the Internet. This may appear to be an unnecessary proliferation of files, but
there are three important benefits to this new approach, all driven by different aspects of the
Citadel scenario:
A Citadel JSON file can be updated at any time even live feeds can work as long as
the URL returns the expected JSON schema following the expected semantic structure.
Any city or user can add a new file respecting the same semantic schema and the
application will be able to read it and use the data, as long as it knows the URL: the
same application can be reused with no changes.
Any application developer can access the available JSON files for any purpose, simply by
knowing in advance what semantic structure to expect: the same data can be reused
with no changes.
By live we mean that, rather than having to download a file and then read the information, the application can
directly access the information from the server hosting the JSON file.
40
The above diagram is in fact based on the above discussion, showing the ODC no longer as a set
of tools but rather as a set of JSON files (the green boxes) linked to different templates and
applications, all stored somewhere on the web and accessible through a URL. The tools are still
there (the dotted line below) but there is an important shift of focus: the common space is now
characterized by the way it accommodates datasets with different semantic models. Indeed, the
figure shows the different types of applications on the top the generic template of the AGT,
the specific application templates developed in the first months of the project, third-party
As of October 2013. It should be remembered that this is a conceptual schema that is broader than what the pilot
cities actually tested, which instead had the Converter using only the generic data format required by the AGT.
41
templates, and even third-party applications and programs each of which gets its data from
the datasets in the ODC that have been formatted in the way they expect to see it.
From an operational standpoint, this approach requires an index function to keep track of which
datasets can be read by which applications, but once the pairing between a template or
application and one or more JSON files has been configured, the job is done7. The Citadel Index
in its current configuration only foresees pairings between apps generated by the AGT and
Citadel JSON datasets, so it does not yet reflect the open nature of the ODC model. How this
index function might evolve and scale up is one of the issues for future development of the ODC.
The exception to this is Discovery, a Citadel function which allows the user to move from city to city and the
application to automatically detect the presence of a new dataset with information relevant to that city, since
Discovery is essentially a dynamic configuration of the App-URL link that takes place through the Index. For the
purposes of this discussion however, the semantics of the underlying data model remain the same.
8
This may seem obvious but very few civil servants have had the gratifying experience of seeing a dataset they have
published actually used in a mobile application.
9
Cleaning up source data is one of the main costs of traditional Open Data initiatives. The original workplan for
Open Data activities envisaged the engagement of citizen groups in this lengthy process, but the dynamics described
here have proven far more effective.
42
gradually saw the usefulness of an intermediate step in which the terms they used (name or
even titre for title) are mapped onto a standardized vocabulary, and that the output format
required for the AGT was just one of many possible ways in which their data could be used, once
the semantic mapping to standardized terms had been carried out.
It is hard to overestimate the impact of this engagement of civil servants, and how building a
sense of ownership of a dataset in the person who generates it generally seen as a source of
trouble and not a resource can be the best way to ensure quality. The uniqueness of the
Citadel approach is to actively empower the people who create datasets in the first place,
influencing their behaviour by showing the consequences of sloppy data, directly rewarding
good data with a working app, and thus promoting the convergence of behaviour patterns
towards common standards of practice.
Once the Converter had reached a point of stability and was in active use by the pilot cities, it
was useful to explore how it can be adapted to different data models and different applications,
in order to steer the process from the pragmatic Converter-AGT toolkit prepared for the pilots
towards the multiple-standard approach of the ODC model as originally conceived.
These enhancements effectively open up the conversion process to other options such as Open
Street Map, geoJSON, the CitySDK APIs, etc. Since they were developed in the final stages of the
project, they were not fully tested in the pilot cities, though they nonetheless demonstrate the
flexibility of the Citadel approach and the possibility of migrating from the original ConverterAGT toolkit towards the multi-standard ODC concept that has inspired the developments of the
Open Data Commons throughout the project.
Next, the ODC became the Converter-AGT toolkit, designed to overcome a specific
problem identified by the pilots.
Finally, a series of alternative data models and conversion scenarios were explored,
framed by the original ODC framework, extending the Converter-AGT toolkit to re-gain
the goal of an open system.
In this process, ODC development was shaped by the semantic issues that eventually defined a
core, a-standard (in the sense of not requiring standards) semantic framework that is at the
heart of the model.
43
This schema considers that the common space is driven by the presence of three elements,
JSON files with pre-defined semantic structures (provided by one or more applications
and registered in the Index),
Converter tools that carry out the necessary semantic mapping from the unstructured
CSV files towards the structured JSON files,
CSV files with any semantic structure (namely with whatever choice and sequence of
column headings the original user defines),
plus the index which keeps track of a) where the CSV files are and where they come from; b)
data models used by applications and the JSON files that conform to them; and c) which
converters can be used to produce which data models.
The various tools developed above and below this core interact through it, allowing for different
standards to co-exist by providing the semantic framework that matches data models to the
applications that can use them, forming an open semantic ecosystem.
This open ecosystem, upon closer inspection, consists of the very tools that were originally
conceived of as populating the Open Data Commons, considered as the shared space in the
public domain. Indeed, the services and tools shown are all reusable, generic components,
whose semantic interoperability is guaranteed by the fact that they can speak to and get
information from the semantic core.
In addition, this schema fulfils the original idea of the ODC as a negotiation space for userdriven convergence towards standard semantic structures. As stated above, any data model can
be registered in the index together with the conversion tools to create the necessary files. At
the same time, however, users are likely to converge on the data models with a greater number
of JSON files available, so long as they meet their needs. This encourages both the development
user-driven standards both for specific data models for precise requirements (i.e. restaurant
menus) together with data models of general relevance (i.e. POIs), with the balance being
gradually defined within this operational semantic framework.
44
The model of the open data ecosystem above contains nearly everything developed to date in
Citadel: what, then, is outside of the Open Data Commons? The fact is, Open Data systems in
the real world are ultimately fed by real (in the sense of not necessarily Open Data) office
systems, files, and services on the one hand, and are used to contribute to the development of
real (in the sense of normal use, not only finding a parking place) applications on the other. This
is easy to forget, since most Open Data discourse to date seems to take place in a separate
world from our daily life of interacting with ICT systems. The end objective of Open Data, at
least in the Citadel perspective, is to become part of the real world, simply as an efficient way
of addressing interoperability issues when linking different data sources to applications, as
illustrated in the figure on the following page.
This scenario is conceivable only with a massive uptake of the Open Data paradigm, which
Citadel considers to be a possibility enabled by the ODC with its semantic core. At least in the
context of this book, we can say that the definition of the semantic core has been the key
enabling mechanism for unlocking the Open Data Commons, and remains the driving force of
the concept.
45
46
However, the analysis carried out at the time also showed the existence of a latent tension
between the promotion of the commercial (and/or non commercial) use of Public Sector
Information and an excessive protection against the risk of personal data disclosure that didnt
10
http://ec.europa.eu/justice/newsroom/data-protection/news/120125_en.htm
47
take into account the blurring of traditional distinctions between data holders and collectors,
application providers and final users. Such tension has been epitomized by the so-called Citizen
Developer profile, a natural person engaged in the processing of open data with the support of
Citadel application templates. This person is empowered to create, own and control his or her
own datasets and applications, and share them with other participants in the Open Data
Commons on terms that are set and negotiated, as need be. For instance, a local bridge club
member can be incentivised to publish online the list of fellow members, together with their
home addresses, to make it easier to find a location for the next game, through a newly
developed app that mashes up the bridge club list with the official dataset of city parking
facilities. Unfortunately, the same list once made public could be of interest for some IT-savvy
burglar, noting which homes are left unattended on the occasion of the next club meeting.
While possibly trivial as an example, it shows how significant and irreversible can be the
unwanted consequences of privacy carelessness, even in the case of prior approval by personal
data owners, and despite the fact that nobody acted with profit making purposes in this
scenario (except perhaps the burglar). This is however, all but an occasional risk in the Citadel
world: in fact, it was a precise mission of our project to facilitate the realisation of open data
driven applications by non-expert users through the Open Data Commons resources. And the
objection that publishing someones home address goes beyond open data is weak, for at least
two reasons: a) that many other public sources may already have disclosed the association
between a certain name and a specific address, and b) that there could be a shared interest in
running the privacy risks of this disclosure, for instance to make known to other potential
members the existence of an active bridge club in the city with practical venues for playing in
the same neighbourhood or to invite other IT-savvy people to co-create a mash up service
showing all the right PoIs (Points of Interest) and useful connections at hand.
As a matter of fact, due to its seeming irrelevance for digital business, this case is not
considered by the upcoming legislative reform. According to Art. 3 of Directive 95/46/EC, its
provisions do not apply to the processing of personal data by a natural person in the course of
a purely personal or household activity. The new Regulations Art. 2 repeats the same concept.
For sure, the Citizen Developer figure complies with the first part of the definition being a
natural person but not necessarily with the second qualification, given the possibility offered
to him/her by the Open Data Commons, either of adding data to an initial City PoI collection, or
of making improvements to an existing application provided by someone else. Paradoxically, if
someone started to claim (probably little) money for the mash up service from some time on, all
the legal consequences of past privacy carelessness would be charged to the last edge of the
value chain, although this could also be taken as a (clumsy, yet innovative) example of open
data exploitation for business purposes.
With this case description, we do not intend to imply that Community Law should consider
adding to its scope the activities of Citizen Developers, but only reinforce the importance of
preventive privacy assessments, rather than corrective or punitive actions. A relatively shortterm scenario, also driven by the likely popularisation of the Open Data Commons, will see a
growing number of applications heavily relying on citizens own datasets if not also on user
generated improvements, according to the Open Source or Living Lab logic. There, the risk of
48
involuntarily merging relevant (to the new service targeted) with irrelevant personal data is high
and must be considered upfront.
49
Currently, a PIA standardisation effort is ongoing at the ISO/IEC Joint Technical Committee No.
1/SC 27 IT Security techniques [9], but its results will only be made known in 2016 or later. The
ISO/IEC 27002 standard for IT systems security already includes privacy protection. Yet, despite
this claim, it leaves privacy policies and measures unspecified. Therefore, no single agreed PIA
procedure or guideline exists at the moment and the Citadel PIAF only adds to a number of
concurrent methodologies and approaches. In particular, current PIA schemes follow a risk
assessment approach, aiming to minimize the risk of privacy breaches and the consequences of
that on a particular organisation. In Citadel, this is complemented by an empowerment
approach, whereby communities and citizens can have greater control over their own
information, thus also contributing to lower the risks for whoever manages it.
PRIVACY TYPES
The original contribution of Citadel to the theoretical and practical debate on PIAs comes from
the distinction between three types of privacy: Community, Application and Data level. These
have gradually emerged from the iterative activities done across the project tasks during the
past couple of years. This distinction becomes relevant in three respects: first, because it is
commonly agreed that a fully functioning PIA should deal with all types of privacy within their
respective scope; second, because with the introduction of an Open Data Commons it is no
longer clear who the data controllers are, whom the liability of privacy protection should be
attributed by law; third, in relation to the fact that the practical measures to embed privacy
concerns into the design change quite a lot in relation to the specific nature of the privacy issues
tackled.
These three typologies of privacy are only partly overlapping, as the following picture exhibits:
50
Community level privacy can be defined as the way this concern is perceived and assessed by
the stakeholders potentially affected by it. As the PIA process outlined in Figure 19 documents,
it is essential for data controllers to make sure they understand the distinct interests and
arguments of the people and organisations involved in their community of reference, as far as
the management and the potential risk of disclosure of personal data and information is
concerned. As a matter of fact, the Citadel project since its early stages has promoted the
constitution, in each of the pilot Cities, of the so-called Open Data Governance Groups (ODGGs),
consisting of the key stakeholders interested in the opening up and cleansing of public (normally
local government owned) datasets. One of the key topics of discussion internally to the ODGGs
has inevitably been how to deal with the privacy implications of the utilization of open datasets,
particularly in association with no profit activities. The resulting template of Open Data Charter
is meant to include a specific section on the terms and conditions of privacy protection, with its
contents reflecting the specific outcomes of the thematic discussion in the Cities.
In turn, Application level privacy can be defined as the extent to which applications such as
those experimented in Citadel deal with user information by either disclosing or protecting it.
Apps can collect significant information about users and their devices, often without their
knowledge or permission. It is quite rare that comprehensive information in clear and plain
language is provided to new users about the features of a given app, what information will be
accessed by whom and how it will be used or to whom it will be disclosed. Merely offering a
single 'Accept' or 'Install' button is unlikely to support valid user consent. In February 2013, the
Art. 29 Data Protection Working Party formulated an opinion on the security and privacy risks
associated with the use of applications and proposed a set of recommendations to each of the
different players in the marketplace [2]. During the Citadel project, a number of application
templates as well as a generic resource called Application Generation Tool (AGT) were
developed and positioned on Github and the Citadel Hub. Ideally, each of these tools can help
generate innumerable apps (as they already have) with only few differences across the various
51
possible datasets, locations and utilisations. Therefore, it makes sense to define the privacy
policy of each application template as well as the AGT.
Finally, Data level privacy is a concept that has been developed during the project as a result of
the reflections and experimentations done as mentioned above. It can be defined as the
qualification of a single data item (or row in a dataset) in terms of its possibility to be safely
disclosed, without generating any harm for the original data owner. Of course, being an
attribute of the single data entry, it cannot be assigned by any other subject than the data
owner, nor can it be changed from public back to private any more11. Obviously, the specific
attribute of a data item affects the quality of the dataset it belongs to and each transformation
thereof. For example, a JSON file created out of existing CSV or similar with the Citadel
Converter (which is another free tool of the project) should in theory preserve the same data
level privacy attribution as the source of this transformation.
The following three sections separately delve with these privacy typologies also in relation to
the prospective impact of the forthcoming legislative reform described earlier in this book.
Taken together, they form the three distinct conceptual elements of the Citadel PIAF, as shown
by the table below. Each of these aspects implies specific governance issues that will be
discussed in the perspective of their being instantiated in the agreements supporting Citadels
Open Data Governance Groups. In this way, the PIAF contributes to ongoing PIA standardization
efforts, by further highlighting the communitarian not only organizational dimension of
privacy management even in a context like the one of Open Data, that at first sight poses little
(if any) challenges to the protection of personal information.
Table 4. Summary of the three levels of the Citadel PIA framework
Level
Community
Application
Data
11
Focus
City leading
Open Data
Governance
Group(s)
Professional
Developer (or
Citizen
Developer)
Goal
City
level
PIA
Individuals
generating data
items
Data
level
PIA
App
level
PIA
Issues
Beyond risk assessment in
organisations to community
stewardship of open data
policies
Citadel scenarios leading to
multiple authorship and
personal data mash-ups
with potentially unforeseen
outcomes.
Adequate information on
who is using personal data
and guarantees that
individual privacy
requirements will be
respected
Proposal
PIA embedded in Open Data
Governance Charter as a multifaceted framework to highlight
emergent privacy issues
Open Data Commons, AGT and
templates with privacy policy
embedded by design (scope for
more privacy as a service
features)
ODC Index based licensing
system (explicitly dealing with
converted datasets) based on
the Creative Commons analogy
for supporting Privacy
lifestyle decisions
This statement could soon be reversed, according to the results of the debate on the right to be forgotten.
52
and the interviewed panel was asked to select which roles were/should be more appropriate for
each category, picking up from the following options:
a) Being informed and consulted - the actors are kept up to date on developments and
consulted when general strategies or plans need feedback;
b) Participating in decisions - actors contribute to specific decisions on how to implement
an Open Data strategy;
53
c) Active, leading role - they are directly involved in the concrete implementation of the
Open Data strategy;
d) Not relevant none of the above options is applicable to the actor group at hand.
As far as the definition and enforcement of privacy related policies are concerned, the
respondents assigned a crucial role of leadership to the City government and ICT department,
and to a lesser extent to the public and private data providers. As far as the remaining
stakeholders are concerned, while information and consultation is always welcome during the
process, a slightly more participative role was invoked for the Citizen Developers only, as the
following diagrams exhibit:
Figure 21: Stakeholders role in the definition of privacy policies (Citadel members)
54
Figure 22: Stakeholders role in the definition of privacy policies (Citadel non-members)
A third survey was run in conjunction with the Evaluation activity and it was specifically
designed to capture detailed feedback about the Citadel tools and their usage. At that time, the
Citadel tools (Converter plus AGT) were already available and thus the prospects for a much
more open approach to Open Data were also evident. Over 100 people participated in the
experiment, thus providing an adequate statistical base for extrapolating results from the
answers received to the online questionnaire. It is relevant to note here that the opinions
expressed on privacy related issues seemed to differ very much across the interviewed panel.
On the one side, some respondents rightly affirmed that privacy regulations in force have little
or no impact on the process of opening up the public datasets belonging to the pilot cities.
While this can be undoubtedly true within the Citadel partnership, in other cases, however,
there is evidence of privacy issues and concerns building a sort of psychological barrier against a
faster and widespread implementation of open government and data principles. The Open Data
Charter has been named as a viable solution to make people aware of possible breaches or
consequences of opening up data.
On the other side, it should be noted that the upcoming reform of EU data protection legislation
reinforces the responsibilities of data and service providers, placing a heavy burden of overhead
for compliance. As was argued in the previous section, this reform seems to endanger the status
of the Citizen developer lying at the heart of the Citadel vision, even though natural persons as
such do remain out of the scope of the privacy legislation. Again, the Open Data Charter is seen
as the right localisation for some ad hoc provisions in the direction of privacy management.
More generally, the judgement concerning the current and prospective framework and
guidelines on data protection was mixed. While most respondents were aligned with the
principles and directions of the EU initiative, others thought that the real issues of concern were
not properly addressed, as e.g. the social norms about privacy are shifting over time, and the
55
exploitation context proposed by the Citadel project, in which published datasets are used by
citizens for non profit purposes, is worthier of trust than mere commercial exploitation.
A closer look at the findings from the questions in this survey specifically related to privacy, in
the light of the three-level Citadel PIAF, instead provides an explanation of these apparent
contradictions.
The highest risk people see of not opening data is missed opportunities for useful
applications and services. Therefore, people see the value of open data and are expecting
someone to solve privacy issues upfront, as we suggest in the Citadel PIAF.
People want to be able to manage their own data in more detailed ways than foreseen by
EU legislation, for instance allowing access to specific groups (44%), prohibiting commercial
use (42%) and so forth.
People are very concerned about what happens to their private information; fears of
inappropriate disclosure, unacceptable use, and insecure storage are the first three
concerns, all above 60%.
Surprisingly, over 70% of respondents (many of whom within city administrations) dont
know whether or not their City has even published a PIA. Since people are indeed
concerned about privacy, the PIA appears to be considered more as a question of
compliance rather than a process that guarantees what theyre looking for.
A specific question confirms the result of the previous survey, namely that people trust
municipal authorities and in particular municipal IT departments more than anyone else and
dont trust suppliers to the government nor Communication and PR office.
While City governments may be trusted to manage privacy issues, should not act on their
own but rather follow guidelines and policies decided upon through public consultation.
ANALYSIS OF OUTCOMES
Alongside the periodic surveys, some discussions on privacy issues can be reported within the
Open Data Governance Groups in the four pilot Cities. Overall, the results in the pilot cities echo
the last external survey: people are concerned about privacy but the PIA looks insufficient to
meet these concerns. However, during the project the contribution of the ODGGs has stayed
well below initial expectations. While everyone agrees that the ODGGs can be defined as (sort
of) knowledge broker on the challenges and barriers of opening and using government data,
received input on privacy and data protection issues have been limited. In particular:
1) An Early Stage Approach has been invoked, to integrate best practice around privacy and
data protection right at the start of any City release of Open Data;
2) Trust and Confidence building are deemed essential, although the question remains of how
to create and maintain them in local citizens and businesses;
3) External (Third Party) Control was also required, for instance through appointing an official
Ethical Advisor to help the local authority oversee privacy and data protection matters.
Additionally to the above, as noted above, the initial concept of Open Data Charter proposed in
the project gradually evolved from a standard protocol to be adopted and formally signed by
the interested City stakeholders, to an alternative, more flexible structure akin to a set of
56
guidelines that focuses more on the respective roles and contributions of governance group
actors together with specific indications on how privacy is to be managed.
The opportunity to ignite a specific discussion on privacy among the ODGG actors to explore
the consequences of data opening and utilisation, and especially to design scenarios where
the Open Data Commons acquires value-adding features such as Privacy by Design or
Privacy as a Service at the community level;
The possibility to establish a more favourable regime in the City, as far as data protection is
concerned, for those datasets and applications which do not fall or not easily so within
the provision of extant and forthcoming legislation;
The chance of bringing these aspects to the attention of public decision makers with even
greater relevance and urgency with the prospective diffusion of Open Data Commons
facilities, such as the Converter facilitating the publication of own data by individual
citizens and the App Generator facilitating the own app creation scenario for non IT savvy
or expert users.
12
http://opensource.org/licenses/BSD-3-Clause
57
Find a
Parking
Lot
Events in
the City
Points of
Interest in
the City
User
Generated
Points of
Interest
Environmental
Data
Athens
http://demos.citadel
onthemove.eu/parki
ng-athens/
http://demos.citadel
onthemove.eu/even
ts-athens/
http://demos.citadel
onthemove.eu/poisathens/
http://demos.citadel
onthemove.eu/crow
d-sourcing-athens/
http://demos.citadel
onthemove.eu/envir
onment-athens/
Ghent
http://demos.citadel
onthemove.eu/parki
ng-gent/
http://demos.citadel
onthemove.eu/even
ts-gent/
http://demos.citadel
onthemove.eu/poisgent/
http://demos.citadel
onthemove.eu/crow
d-sourcing-gent/
Issy-Les-Moulineaux
Manchester
http://demos.citadel
onthemove.eu/parki
ng-manchester/
http://demos.citadel
onthemove.eu/even
ts-issy/
http://demos.citadel
onthemove.eu/poisissy/
http://demos.citadel
onthemove.eu/crow
d-sourcing-issy/
http://demos.citadel
onthemove.eu/envir
onmentmanchester/
In short, Find a Parking Lot provides information about the parking lots of a given city. The city
is configured in the back end of the application. The first page of the application presents all the
available parking lots on a map that is centred to the centre of the selected city. Events in the
City displays the events of a city on a map or list view and helps the user get information on the
types of events (s)he is interested in. The first page of the application presents a map of the city
centre just like in the Parking facilities template. The events are always geo-localized, so the
ones that take place near the city centre are those that are displayed in the first place. Points of
Interest in the City is a general application to display any kind of PoIs on the map of a chosen
city. Every PoI is categorized under one or more categories, e.g. Museums, Transportation, etc.
The template offers a filtering functionality that uses all the categories of PoIs found in the
given dataset and provides users with a list of checkboxes corresponding to those categories.
User Generated Points of Interest is a template that provides user-generated PoIs of a given
city and other crowd-sourced information. The city is configured in the back end of the
application. The first page of the application presents a map that is centred to the centre of the
selected city. Users can select different categories of user-generated PoIs to be shown.
Different colours of pins represent different categories of PoIs. Finally, Environmental Data
provides information about the environmental, and in the future, traffic and transportation data
of a given location in the city.
With these templates at hand, also made available on Github (https://github.com/citadel-eu),
every Citizen Developer can potentially create their own mobile applications, linking to the
open public datasets made available by the respective local governments (and communities, as
it is the case of user generated points of interest). As Table 5 shows, no fewer than two pilot
cities, if not more, have adhered to the task of feeding the templates with (at least simulated)
data.
58
Additionally and with the purpose of making life easier to the less IT savvy Citizen Developers,
the project has delivered an App Generation Tool (AGT), available online at the following URL:
http://www.citadelonthemove.eu/en-us/createanapp/applicationgenerationtool.aspx
At present, over 100 European Cities are subscribed as data publishers in the AGT and over 500
mobile apps have been created in about one year. About one tenth of all apps created are
multi-city, namely they concretely demonstrate potential for reuse, and about one in four are
multi-data, namely they provide valuable information to their users through data mash-ups.
ANALYSIS OF IMPLICATIONS
The argument, according to which the EU legislation on personal data protection is not
applicable to Citadel application developments, is based on the fact that with only one partial
exception (i.e. User Generated PoIs), all the other templates in Table 5 are totally dependent on
government data from different sources. Therefore, the project has been successful in
promoting new and original, if not innovative, ways to exploit the publication of open public
datasets for basically non-commercial purposes by the private sector and particularly the
Citizen Developers.
However, the Citadel vision itself, configuring an active role for people in the development of
applications, as well as in the generation of data (like PoIs and other crowdsourced
information), introduces scenarios for privacy management that are only partially foreseen by
current and pending legislation.
Based on the periodic reports from the ODGGs in the Citadel pilot cities, the main privacy risks
to be considered at application level can be listed as follows:
a) Geolocalisation. The User Generated Points of Interest are inevitably georeferenced on
the City map which can create a feedback loop with personal information.
Additionally, each Citadel app template to ensure better performance may require the
communication of the exact localization of the user, which can considerably facilitate
his or her identification by third parties13.
b) Shared Access. It is unclear whether the apps developed with the support of the Citadel
AGT should be considered for private use only. The notion of private use might include
the creation of small closed groups (including e.g. friends or relatives), though it is
unlikely that a group registration system would be added in that case. As a matter of
fact, when shared with other users or linked on Web 2.0 communities, the app can
disclose and publicize personal information (on user localization, behaviour and related
aspects) to a broader audience than originally planned or desired14.
c) Extension in Scope. In principle, the app templates provided for a baseline 100% open
and public datasets scenario could be profitably reused for other purposes, where
underlying datasets, still owned by third parties, may be private or confidential or just
not authorised for such particular use. A variant of this extension in scope can occur if
13
Actually, one uses the localisation feature of HTML5, which therefore has to do with the browser more than the
app. But the app has to be authorized to work with it anyways.
14
Not surprisingly, some Associate cities (eg. Amsterdam) have asked for "private" ODC spaces.
59
and when the user decides to mash-up the open and public with his/her own generated
datasets (like in the example of the bridge club above).
d) Crowdsourced Data Inputs. Depending on the way an (extended) app is configured,
users can be called (or tempted) to add their own datasets or complete/integrate the
data items in the existing ones. This may facilitate personal identification or change in
the level of privacy protection of used datasets.
e) Open Source Software Improvements. The Citadel app templates are already available
on Github. According to the OSS logic, developers can bring improvements to the
existing release(s) under the condition of free redistribution. However, depending on
the nature of the app or template, this can reinforce or reduce the level of privacy
guarantee.
f) Recurrent need for user consent. For the reasons expressed above, it might be
necessary to repeat the request for user consent even after the specific application has
been installed, for instance whenever a new user enters the community or does specific
actions with and on the app.
That app developers are juridical persons, usually profit making organisations
That the personal data under scrutiny only belong to the individual user
That the app features are consolidated and do not change much over time.
As the examples provided above demonstrate, these principles are no longer valid in the Citadel
world, where
The app developers can be natural persons (citizens) or not for profit entities like e.g. social
networks or associations
That the datasets used may belong to several owners and subject to different policies (of
openness and permission to reuse) including no policy at all
That the app features are subject to continuous revision by multiple parties at the same
time, according to the OSS logic.
In the early stages of the Citadel project, a taxonomy of data/application pairings was proposed
in order to make room for such variations of the theme, in the perspective of existing and
upcoming EU legislation on data protection. Then, we identified nine possible instantiations (use
cases) worth consideration in terms of liability for data providers and app developers. That
taxonomy is proposed again here, with modifications, thanks to the following table, where the
use cases of interest have now become eight.
60
Person
owns data
Person
produces
data
Person
rd
uses 3
party data
Data is stored
locally on the
app (be it ones
rd
own or a 3
partys)
1. Out of the
scope of current
and future data
protection
legislation, as no
disclosure
occurs
5. Out of scope,
but common
sense should be
used (eg
diligence in
custody)
Data is made
public through
rd
3 party app
4. In the scope, if
app is
professionally
run, grey zone
otherwise
8. Out of scope
for the individual,
as data captured
was already
public
1. Person owns/produces data, which is stored locally on the application. Example: annotating
events in the agenda of a cellular phone or tablet PC. This case (highlighted in green colour)
should remain out of the scope of the current and future legislation on data protection, being
referred to personal use only. No difference should be made by the circumstance that the app
used was bought from a third party or directly produced by a citizen developer for his/her own
purposes.
2. Person owns/produces data and shares it with peers in a closed group using a 3rd party app.
Example: the person sends an email to a friend or chats on Facebook or other social network
about a certain topic. The contents of this communication are private, but what if the receiver
discloses them without a prior consent from the sender? The infringement of privacy would be
certain, not yet its legal relevance (unless a crime is committed due to this disclosure). An early
warning (or a periodic reminder) from the system might be desirable, however, in order to
minimize this risk. From the perspective of the developer of that app (normally a 3rd party, but
there could be some exceptions), being the group closed, liability for any privacy infringement is
limited if a registration procedure existed that foresaw the collection of user consent before
entering the system. This is the prior consent required by EU Directive 95/46/EC to be formally
and explicitly given by the owner (or subject) before any data treatment.
3. Person owns/produces data and peer shares it or makes it public through his/her own app.
Example: a self-developed application that spreads information about energy consumption of
home appliances for monitoring and benchmarking purposes. Given the full identity between
data owner and application developer, pre-emptive consent to data sharing should be
considered as embedded in the system. However, taking into account the risk that data
publication may lead to third party appropriation for other uses (e.g. commercial) not expressly
authorised by the owner, if not for the commission of crimes, it would be advisable to promote
better awareness of these risks by the user in some way. Here the provisions of current and
61
future data protection legislation do not apply, but a role could be identified for a specific
service in the Open Data Commons, for instance.
4. Person owns/produces data and makes it public using a 3rd party app. This may be the case
of a service residing in the cloud for instance, to the benefit of car drivers in a City centre
that shows the respective locations in order to promote e.g. the exchange of parking lots by
those going in and out of the town. Disclosing a trivial piece of information like ones own car
plate number can have unwanted consequences, which should be the object of prior caveats
and informed consent. However, an alternative action would be to embed Privacy as a Service
within the application, so that each user can select the acceptable level of personal data
protection in relation to the scope and purposes of the service required.
5. Person uses 3rd party data, which is stored locally on the application. If private information
belonging to any third party is stored on local devices only, then we might presume it is only for
the individual use of that person. Example: a standard phone directory or address book in
someones cellular phone or tablet PC. While the case is not relevant for the current and future
legislation on data protection, it may become of interest for criminal law if a fraudulent use
materializes. Without going that far, common sense recommendations like adding a password
and regularly changing it can be appropriate, to minimize the risk of involuntary disclosure.
6. Person uses 3rd party data, which is peer shared in a closed group. Example: information
that is acquired on Facebook. This can be reused without limits in the same context (eg.
Facebook) but some behavioural rules (Netiquette?) should possibly be adopted. Use of this
information only for personal purposes (provided they are legal) is also allowed. Another
question might be whether the disclosure of such information outside the borders of the group
would be allowed. Here the answer would certainly be negative.
7. Person uses 3rd party data, which is peer shared or made public by the data owner through
an application developed by him/her. Example: a searchable repository of knowledge provided
to registered users (peer shared) or to the general public by the owner of that knowledge.
Depending on the limitations posed by the data owner, reuse may be free or subject to
conditions. However, the provisions of data protection legislation may not apply.
8. Person uses 3rd party data, which has been made public on 3rd party applications. Example:
downloading information from a repository of open datasets. Here the public nature of
information used is already clear from the start. Therefore, no infringement of privacy law could
be foreseen.
The two grey zones identified in points 2 and 4 above refer to the case of citizen developer,
who is most probably lacking a juridical persons nature and thus difficult to attribute with
certainty to the scope of application of privacy legislation. Here also two contrasting interests
are visibly in action, one to promote the transformation of this spontaneous initiative into a real
business with potentially huge returns, another to avoid that when this happens, it will also be
too late to protect individual user against a voluntary or involuntary disclosure of the embedded
personal information.
62
By identifying several cases as out of scope of the proposed EU legislation, we do not intend to
imply any shortcoming of the proposed reforms nor propose any extension of the normative
framework. On the contrary, Citadel holds that the above dilemma requires a soft regulation to
be designed in the context of the principles of the Open Data Commons and the multi-level
PIAF, and namely by linking the out of scope issues at the application level to the other two
levels identified:
At the Community level, by collectively reviewing app developers privacy policies and how
they are implemented on a regular basis, with more frequent and clear reminders of privacy
risks to app users, in the context of local ODGGs PIA exercises.
At the Data level, by allowing application developers to have greater certainly as regards the
privacy implications of datasets they are accessing. This is the subject of the following
section.
15
Two fields of metadata that are already mandatory in the Citadel Index.
63
2. Real time Updating: Every time the source system adds, deletes, or changes record in a
dataset, this should in theory be immediately reflected in the availability of the new,
transformed dataset on the ODC. How this translates into practice will need to be seen in future
developments, but the issue assumes a specific urgency in the light of privacy considerations.
3. Data Classification: The owner of each dataset (data item) should always be able to classify it
as open and public or private and restricted in use according to a certain licensing
mechanism. A facility on the ODC should help users associate and visualise the last updated
license to each dataset (data item) before usage16.
4. Data Anonymization: Another facility on the ODC should enable data owners to anonymize a
dataset, cleansing it of any reference to specific individuals or organisations during a conversion
process. This can be implemented in future versions of the Citadel Converter.
5. Transparency and Control: Every ODC actor should be entitled to make real time searches on
the log files, discovering who has inspected, appropriated or transformed any datasets
available. In perspective, this should lead to the possibility of assessing whether any ownership
rights have been broken illegally or without justification.
Given the extremely varying nature of the linked datasets and the early maturity stage of the
ODC implementation, it can be more productive to stay focused on process level innovations
such as those above17 as a contribution to the roadmap for future development.
ANALYSIS OF IMPLICATIONS
At present, the ODC prototype environment as developed within the Citadel project is made
up of two main components:
A collection of online datasets i.e. accessible via their URL that contain data in a format
i.e. platform and data structure that can be accessed remotely by at least one template or
third party application without any further conversion;
A unique Index that includes a listing of a) online datasets converted as described, including
where and how they can be accessed, and b) the relevant template and application data
format that are compatible with the files.
In the future roadmap outlined in Citadel, the ODC (together with additional enhancements to
the Index) offers a distinctive opportunity to fulfil the requirements of embedding Privacy as a
Service into an Open Data ecosystem.
First and foremost, basic logging has been implemented for the Converter and the Index, so
that key events are tracked such as the registration of new datasets (by whom and how),
accesses by templates, configurations used, etc. Further enhancements of these features
can considerably promote the transparency and control requirement, provided that all
actors can effectively access the log information contained therein.
Secondly, recent work with the Citadel Converter has explored features that contribute to
the requirements listed above: besides allowing the ad hoc transformation of multiple file
formats into JSON files, compatible with one or more data models, on-the-fly conversion
16
17
A CC license field is also mandatory in the Citadel Index, at the dataset level.
As compared to exploring privacy implications of different kinds of information, i.e. financial, health, etc.
64
scripts have also been tested. This can lead to enhancements that will allow the Converter
to directly access an original data source (or a regularly copied data dump) and only save
the configuration info to the ODC, thus better preserving the integrity of attribution.
Thirdly, the initial project policy based on developing application templates has been
considerably altered by the extremely good performance of the Application Generation
Tool, which has allowed the generation of dozens of original applications but also has the
limit (in its current version) of using only one data model that includes all fields from the
PoI, parking, and event templates. A different roadmap can be that of developing and
storing new application templates, able to visualize types of datasets that go beyond the
Citadel application scenarios. This can significantly contribute to realizing the open and
interoperable vision of the ODC in an incremental fashion.
Finally, the Citadel platform the first common project space where both datasets and
application templates have been made publicly available will continue being open and
accessible after the end of funded activities. This allows to anyone the publication of
georeferenced datasets and their immediate visualisation on a map on their mobile phone.
As already demonstrated by the Associate city outreach activity, this has great potential for
motivating the publication of datasets beyond those of the project partners. However, the
functionalities of data classification and (if required) anonymization need to be enhanced.
Protection
level
0
(Limited or
none)
1
(Moderate)
2
(Contractual)
3
(Sensitive)
Consequences of disclosure
Information intended for public access
4
(Confidential)
Required
action(s)
Acknowledge
source
Ask permission
Pay against
usage
Anonymize
Destroy
65
18
At data item level, the system (most likely a Privacy as a Service resource in the ODC, like
the aforementioned Index) should enable a user to add a metadata, akin to a Creative
Commons license18, that clarifies the extent to which it will be possible to copy, distribute,
and make some use of that data item, either non commercially or for business related
purposes;
At dataset level, the data item with the highest protection level should determine the whole
datasets qualification and classification. However, it should become possible to delete
some data items in order to create a new dataset with a lower protection level. In other
words, the classification is confined to the data, it does not extend its scope to any dataset
created with this data, provided the data itself is not manipulated;
At application level, the risk of privacy breach is zeroed in case a particular intelligence is
added to the system, which attributes to e.g. data mash-ups the higher protection level of
all used datasets to that specific purpose, asking eg. the permission of the user only if and
when necessary (of course, all the caveats and provisions of the data protection legislation
should remain valid);
At community level, five events would be particularly relevant and interesting to explore:
1. What happens after the data owner has eg. given the permission to use that data in a
certain context. For sure, the licensing metadata should trace this circumstance. But is
this enough to authorize future reuse, in other contexts than the former? Here the
answer should be probably not. However, it would be hard to prevent unauthorised
reuse once e.g. someones phone number has been published on the web for the first
time. Therefore we might think of a lighter approach, which allows free reuse of a data
item (for instance, by lowering protection level from 1 to 0 permanently) under the
only condition that the owner should be informed whenever that data is used again.
2. What happens if a third party (like another user) manipulates a dataset by adding in
coherence new records to it? In this case, each additional data item should bear its own
metadata and again the protection level of the dataset would match the highest
protection level of any single data item contained therein. Same outcome in case the
addition was made by an application, rather than a human being. At least in principle,
an algebra of data transformations could be devised in that case eg. if a new data item
is the sum of two, the result should automatically bear the protection level of the
highest addend.
3. Of course, any conversion of a dataset into a different format should not determine any
change in the predefined privacy attribution. For example, a JSON file created out of
existing CSV or similar with the Citadel Converter should preserve the same data
protection level than the original source of this transformation.
4. What if there is a mistake in a dataset proposed by someone, which is corrected by
someone else? The case is not that relevant per se, because it should be treated as the
previous in this list (with the new data item bearing the licensing metadata attributed
by its owner), but more as an example of conflict between false positives and false
negatives, which severely affect the world of data19. Probably the best way to solve a
66
potential impasse is to keep track of all versions of a certain dataset, in order to allow
the recovery from mistakes in correcting mistakes.
5. Could a user change his/her mind and modify the attribution of a certain data item from
a previous, lower level to a higher level of protection? While the answer is certainly
yes in principle, this would prove impossible to do in practice after the protection has
been set at 0 for the first time. Following this train of logic, we might infer that once
the user has received sufficient advice for informed consent to data publication, this
decision should be presented as irreversible before collecting his/her approval on it. The
lesson learnt from this case is that EU privacy regulators should reinforce the procedure
of information delivery prior to user consent collection.
The proposed framework however poses important challenges to the Open Data Commons and
all similar ecosystems where important collections of data and applications would be made
available. In particular, the ODC should provide both the framework for the collective definition
of privacy guidelines related to both applications and datasets, and be an integral part of the
governance of open data as a public good.
The above cases, which derive from the way applications use data, all have to do with the
dynamics of applications during their use. And here we should remember that high data
protection levels (if unjustified, of course, in relation to envisaged uses) prevent the
development of the digital market and ultimately economic growth in Europe two major goals
of the incumbent and upcoming EU legislation. Therefore, an adequate privacy scheme should
be closer to a licensing scheme, in the sense that it is not just a question of see vs. dont see
but a more articulated issue of what happens to my data. In other words, like for the
classification scheme above, it is at the level of the individual data item that a license
mechanism should be applied, like the following table shows:
Table 8. Proposal for a Data Licensing Mechanism (based on the CC scheme)
Abbr.
PR
Meaning
Privacy Restriction
PR-BY
PR-SA
Re-use with
attribution (by)
Share Alike
PR-ND
Non-derivative
PR-NC
Non-commercial
PR-NI
Non-identifiable
Status
Proposed
as new
umbrella
Existing in
CC
Existing in
CC
Existing in
CC
Existing in
CC
Proposed
PR-NP
Non-position
Proposed
A given individual, either at the moment of providing data or enabling a device to generate
private data, could thus assign a PR license to each data item (ie. a positioning reading), for
data see: http://jeffjonas.typepad.com/jeff_jonas/2011/02/sensemaking-on-streams-my-g2-skunk-works-projectprivacy-by-design-pbd.html
67
example PR SA ND NC NI. The way such a license should be applied varies according to the way
data is captured20:
Volunteered data by people who explicitly share information about themselves through
electronic media - for example, when someone creates a social network profile or enters
credit card information for online purchases;
Observed data captured by third parties while recording activities of users (in contrast to
data they volunteer) - examples include Internet browsing preferences, location data when
using cell phones or telephone usage behaviour;
Inferred data from the analysis of personal data belonging to the previous categories. For
instance, credit scores are calculated based on a number of factors relevant to an
individuals financial history. (This is a derivative form of data capture only allowed if the ND
is not present, but in the case of SA the aggregated data item should maintain the same
license).
The following table summarizes the required characteristics of ODC or similar environments to
align with the logic and implications of the proposed data licensing scheme.
Table 9. Proposal for a Data PIA Framework
Data level
License handling
Compliant
system
Only accepts data
items with license
Data item
Defines level of
data protection
thru PR license
Original
Dataset
(CSV)
Contains mix of
data protection
levels and a PR
license as common
denominator
Dataset (CJSON)
Is compliant with
specific level of
data protection by
the corresponding
PR license
Guarantees usage
coherent with data
protection level as
stated by PR license
Guarantees usage
coherent with data
protection level and
the PR license
Contains only
data items
coherent with
dataset license
Application
Third parties
(agency /
company)
Only
reads/accepts
compliant
datasets
Reports usage of
datasets
Agreements
Validation
Administrator
guarantees
coherence of
Dataset license with
data items
Application
developers agree to
abide by licenses
Community (ODGG)
can make specific
agreements with
third parties
Community (ODGG)
can trace use made by
applications via the
ODC Index
Community (ODGG)
defines terms for
control of third party
activity & configures
Index accordingly
20
68
Overall, the diffusion of this licensing mechanism may have important consequences, both in
terms of personal rights and commercial exploitation of open data. On the one hand, the license
helps data owners to keep track of who uses their data and when retaining copyright and
credit if that is the case while not impeding to others the appropriation and manipulation of it.
Differently from the Creative Commons license, ShareAlike would be possibly allowed, under
the condition of giving feedback to owners about the uses of their data at least for commercial
purposes. On the other hand, particularly after a dataset has reached the protection level of 0
or has been licensed as PR-BY, app developers and other digital businesses would be facilitated
in their activities, being able to demonstrate that data legitimately belongs to the public
domain. In this way, the unwanted legal consequences of past privacy carelessness would no
longer be charged to the last edge of the value chain.
69
70
Understanding) at
https://docs.google.com/a/edmonton.ca/viewer?a=v&pid=sites&srcid=ZWRtb250b24uY2F8b3B
lbi1kYXRhLWNhdGFsb2d1ZS0yLTB8Z3g6MmUwMTMyNjhiNTBkZDNiOQ). Other Cities or public
institutions, particularly in Europe, have either used the MoU instrument to establish strategic
relationships with different levels of government (this can be the case of the Vienna Smart City
agreement between the City Mayor and the Austrian Ministry of Infrastructure, see the news
about it at https://smartcity.wien.at/site/die-initiative/strategie/smart-city-wien-neueinitiative-bundelt-krafte/), or with same level organizations and institutions (compare the MidAmerica Regional Council, which gathers 9 member counties from Kansas City, MO see
http://cfakc.tumblr.com/post/60775247513/digital-innovation-in-government-resources), or
with leading think tankers and experts in the domain (for example, the four MoUs signed by the
BBC in 2013 see the news at http://www.techweekeurope.co.uk/news/bbc-agrees-open-data132653). Other Cities have preferred a more formal establishment of rules concerning Open
Data governance, such as through ad hoc legislation (example: New York City see
http://www.nyc.gov/html/doitt/html/open/local_law_11_2012.shtml). Still others have issued
ad hoc licensing agreements (such as: Goteborg see
http://gbgdata.files.wordpress.com/2012/02/avtal-goopen-1-3-0-copy-eng.pdf, or Nantes - see
http://data.nantes.fr/licence/).
Second, the Open Data governance system outlined by the Citadel vision has the merit of
integrating all the local stakeholders belonging to the public sectors data and information reuse
value chain we first outlined in the beginning of this book and reproduced in Figure 7 above.
While the eight stakeholder typologies presented to the Citadel survey respondents fully map
(with more internal specifications) the four communities originally displayed in that picture,
there is an obvious need for clarifying the respective roles and contributions to the
achievement of a common vision and the reciprocal gains and benefits that can derive from it.
This special need will be partly fulfilled in the remainder of this section.
The vision behind this ecosystems representation is that of a socio-technical environment,
made up of people, networks, institutions and technology artefacts, which co-determine the
direction and progress of open data publication and use policies (in this case). In addition to
communication and collaboration activities among the four groups of stakeholders that make
up this ecosystem, a set of behavioural rules, resources and practices contribute to shaping the
main function that this environment has to deliver in order to survive: innovation. According to
[11], after an extensive review of literature from various fields (economics of innovation,
entrepreneurship, sociology of technology and political science), seven are the key capabilities
to be enhanced for such systems to evolve and perform well in terms of innovation:
1.
2.
3.
4.
5.
6.
7.
72
In the following table, we provide a few examples of how different governance activities
contribute to improving the above capabilities. Some of these can well be enhanced by the
application of a MoU-style agreement, some others can not, depending on historical and
cultural circumstances.
Table 10. Open Data Ecosystem capability matrix
Ecosystem
capability
Example from
Citadel pilots
1. Knowledge
base creation,
development
and diffusion
http://data.gent.be
/datasets
2. Influence on
the direction of
search and
investment
processes
Local debates on
published and tobe-published
datasets to figure
out new
applications
3.
Entrepreneurial
discovery and
experimentatio
n
4. Formation
and support of
new markets
for innovation
http://data.gent.be
/apps
5. Visioning and
legitimization
of a common
future
http://fr.amiando.c
om/Citadel_EN.htm
l
6. Mobilization
of resources
(human,
financial, etc.)
http://opendatama
nchester.org.uk/
Hackatons, Open
Data Days, etc.
Supported in
Citadel vision? If
so, how
Citadel data
converter
Establishment of
open data
governance
groups in the
four pilot cities
Citadel app
templates
The ODC as a
virtual brokering
system that
brings offer close
to demand of
open data and
applications
MoUs, Open
Data Charter
MoUs Open
Data Charter
73
Ecosystem
capability
Example from
Citadel pilots
7. Development
of positive
externalities
N/A
Supported in
Citadel vision? If
so, how
The ODC as a
holistic concept
that takes
further
momentum and
gains credibility
across time and
cities.
The table should be read as follows: in the first column, we identify a number of capabilities
that a generic ecosystem should be able to demonstrate. Where relevant, we provide in the
second column some examples of these capabilities taken from the Citadel pilots. In the third
column, we list the enhancements that a well-established open data governance system should
offer to existing capabilities. Finally, the last column shows examples of these enhancements as
emerging from within the Citadel project and partnership.
What can be gathered from the table is a latent conflict between two opposed visions of how
the process of opening up data and promoting their utilization can be finalized and made more
effective through formal agreements: on the one side, there are some functions (like 1 through
4) that do not necessarily require formalization through city level MoUs unless there is a need
to attract and include in the process all of the key stakeholders belonging to the ecosystem at
hand. In fact, this is the experience that has emerged with strength from the Citadel pilots. On
the other side, the table lists a few additional functions (namely 5 through 7) where the utility of
a signed MoU can be validly argued.
We hereby propose to solve the conflict in terms of a Maturity Model for the Cities that are
involved in this process. In literature, several models of such a kind exist that aim at the
fulfilment of heterogeneous purposes from merely descriptive to evaluative up to normative
goals. In Citadel, we decided to focus on the CMM (now a registered service mark of Carnegie
Mellon University in the US). CMM or Capability Maturity Model is a five-levels qualitative
model assessing the maturity of an organization with respect to software development
processes [5]. Historically, the first CMM was developed between 1987 and 1997 for the US Air
Force. Prior to the CMM introduction, organizations tended to emphasize the results of
development, rather than focusing on how to improve the process. In principle, the five-level
structure of CMM and its underlying logic can be replicated and applied to any other process,
74
including the gradual establishment of an Open Data Ecosystem like the one described in this
book.
Instantiated to the Citadel socio-technical environment, the five CMM levels of a City could be
redefined as follows:
I. Accessible (e.g. when large sets of public and private data are provided free of charge
to consumers of content and developers of knowledge services in the city);
II. Inclusive (e.g. when all the major value chain stakeholders, including citizens as both
developers and users, are integrated in periodic consultations to express their individual
judgment and evaluation about the opening of data process);
III. Participatory (e.g. when a joint system of decision-making is permanently set up and
used to integrate local communities of data holders, service providers and users in
collective decisions regarding the design, implementation and evaluation of new
services and apps);
IV. Co-creating (e.g. when resources are in place that enable individual persons as well
as local entrepreneurs and larger companies to create new services by the mash-up and
orchestration of existing resources, application templates, or chunks of data);
V. Leader (e.g. when the city government and/or community become attractive leaders,
creed ambassadors, authoritative gurus and opinion catalysers for sustainable
innovation in public services through open data).
Differently from other maturity models, we do not necessarily see these five stages as steps of a
ramping-up pathway. In other words, our vision is not to promote the once-for-ever jump of a
city from level I to (say) IV by the introduction of an ODC instantiation, or assume that you
need to land in level IV before taking off towards level V. Our vision is more similar to a spiral
model, where progress can be incremental over time in all the five maturity stages, and a city or
community may well experience several recurring cycles that go from I to V. As an additional
clarification, we may wish to use the familiar 5-star deployment scheme for linked open data
introduced by Sir Tim Berners Lee as early as in 2006 (see
http://www.w3.org/DesignIssues/LinkedData.html). The following matrix simply maps the
proposed CMM against Lees 5-star scheme, to demonstrate that (depending also on the
starting point) a city may well be a leader in open licensing of public data on the web, and still
lag behind in other kinds of more advanced deployment. Presumably, but this would require an
empirical demonstration, the process evolves gradually across time, but it may also be subject
to quantum leaps or radical innovation experiments, here shown as a zig-zag pattern.
75
5Stars
(link your data to other data to provide context related
information)
4Stars
(use URIs to denote things, so that people can point at
your data more easily and quickly)
3Stars
(use non-proprietary formats for data publication in
machine readable form - e.g., CSV instead of Excel)
2Stars
(make your data available on the Web in structured
form - e.g., Excel instead of image scan of a table)
I. Accessible City (e.g. when large sets of public and private data
are provided free of charge to consumers of content and
developers of knowledge services in the city)
II. Inclusive City (e.g. when all the major value chain stakeholders,
including citizens as both developers and users, are integrated in
periodic consultations to express their individual judgment and
evaluation about the opening of data process)
III. Participatory City (e.g. when a joint system of decisionmaking is permanently set up and used to integrate local
communities of data holders, service providers and users in
collective decisions regarding the design, implementation and
evaluation of new services and apps)
IV. Co-creating City (e.g. when resources are in place that enable
individual persons as well as local entrepreneurs and larger
companies to create new services by the mash-up and
orchestration of existing resources, application templates, or
chunks of data)
V. Leader (e.g. when the city government and/or community
become attractive leaders, creed ambassadors, authoritative gurus
and opinion catalysers for sustainable innovation in public
services through open data).
1Star
(make your data available on the Web in whatever
format under an open license)
Possible patterns:
**
***
****
*****
**
***
**
GOVERNANCE ROLES
Another important outcome of the pilot experiences has been a clarification of the distinct roles
played by the various stakeholders in an open data governance system. As we have tried to
demonstrate with the previous discussion, there can be different levels of maturity in this
system, which correspond to different intensities of engagement for those stakeholders.
However, in a mature community, all of them must be represented and actively engaged.
According to the survey results, there is a considerable awareness of the need for stakeholder
representation in both the Citadel members and non-members who have responded to the
76
survey. However, the underlying (common) vision is still too centred on the Citys ICT
Department (as a proxy for all technical and domain experts who are certainly required to ignite
and support the process from within the local government), while the contribution of other
stakeholders sitting at later stages of the value chain is certainly appreciated, but probably with
a certain amount of lip service paid to it. The reason for this might be that clear rules and
procedures are lacking for the definition of the perimeter and scope of each stakeholders
typology involvement and the signature of a MoU might be a good solution to this impasse.
This aspect is also worth mentioning with respect to the Mayors (and other policy makers)
contribution to the process. In fact, particularly if and when a formal MoU was not signed for
the discipline of open data governance groups, political coordination becomes essential in order
to deliver legitimization and ensure the proactive and committed behaviour of all key
participants.
The following table borrows from the questionnaire results in highlighting the potential
contribution of a MoU with respect to the enhancement of participation and engagement of
stakeholders in the open data governance system.
Table 12. Ecosystem role definition and potential MoU contribution
Role
Defining Open Data
strategies (what data to
publish, ownership and
property rights, pricing,
etc.)
ACTIVE LEADERS:
City/ICT Department; Public Data
providers; Private Data providers;
Citizen Developers; User
communities
77
Design, development,
and configuration of
mobile applications that
use Open Data
Promotion and/or
selection of apps that
use a city's Open Data
(for example, organizing
Hackathons, selecting
best picks, etc.)
78
PROCESS
Historically in the four Citadel pilots, documented progress towards the definition and
clarifications of above roles and tasks has not been dependent on formal agreements, but more
on the growing maturity level of underlying Open Data Governance Groups. We can therefore
hypothesize four alternative configurations of a city/community MoU (or Open Data Charter),
depending on two main conditions:
-
Apart from the top right quadrant where the formalization of a MoU (or an Open Data Charter)
is not required, in the remaining three areas it is left to the decision of the local policy makers
whether this would be required or not. In some cases, particularly when both the maturity and
impact are low, a MoU can be recommended to activate (or rather accelerate) the take-up of
79
open data policy: this is the case of the bottom left quadrant in the picture. In other situations,
it can well happen that despite the good level of maturity in current open data policy, its socioeconomic impact remains negligible, presumably due to lack of involvement and commitment
of local stakeholders. Therefore, one single or a set of MoUs can be designed and implemented
by a city government to attract and consolidate the participation of the market in current and
prospective open data policies: this is what we call finalization in the above scheme. Finally, the
bottom right quadrant is representing the (possibly extreme, but not unlikely) situation where
the city has received clear signals from the market in terms of early impact of open data
initiatives, which now require the integration of expert and specialist knowledge to gain
momentum and become more and more widespread and inclusive. Again, a set of MoUs (like
those signed by the BBC as mentioned at the beginning of this section) may be recommended
here.
For the process of MoU development and negotiation, a set of guidelines can be outlined, based
on the Citadel experience. We split these guidelines in five groups: A) Guidelines for
preparation, B) Guidelines for drafting, C) Guidelines for negotiation, D) Guidelines for
completion, and E) General purpose guidelines.
GUIDELINES FOR MOU PREPARATION
The preparatory stage begins with the realization of the need for a MoU. It is somehow the
following step to the assessment of the conditions stated in Figure 23 as preliminary and
essential for the decision of having one in place. After this assessment has been done, the
purposes of the MoU will be clarified as well its scope and impact. Based on this
understanding, a first draft of the MoU provisions can be obtained.
Proposed steps:
Internal discussion within the city administration, possibly by a dedicated team, to identify:
80
Proposed steps:
discussions will focus on controversial or grey areas and lead to revision of existing drafts (if
any).
It is right at this stage that the advantages and disadvantages of developing a MoU should be
weighed against its objectives and the reasonably expected results. In some cases, the signature
of a formal agreement may create unnecessary bureaucracy or rigidities in the way things are
done. It can also be misunderstood and disrupt a good relationship with some stakeholders,
giving the false impression of building unwanted differences, making preferences where they
didnt exist before, etc.
In case the formation of an agreed text becomes possible after the negotiation phase:
Circulate the draft to all the other parties at the same time;
Involve the persons with the authority to negotiate for their organization;
Identify the immutable points and be open to changing the remaining ones;
Try to finalize the revisions by phone or in person;
Keep everyone informed of the latest changes;
End up with a public event for approving and signing the MoU.
82
Year
1
Approach
Open Data
Ecosystem model
Description
Mapping roles and interactions in Open
Data ecosystems
Memorandum of
Understanding
On-line registration
to ODGG
Associate Partner
Survey of roles
Maturity model
MoU framework
Value
Provided the basic
framework for the Open
Data Governance Groups
Not utilized as local MoU,
provided basis for Palermo
Guidelines
Adopted to avoid having a
signed MoU, helped
defined roles in ODGGs,
not used as such.
Associate Partner
campaign accelerated
with platform in place and
specific outreach
programme.
Contributed to
governance model,
currently in distribution
with Associate Cities for
validation.
Used in outreach
programme to guide
engagement strategies.
Can be used for the
drafting of local
procedures and guidelines
This Charter toolkit guided the pilot cities in the gradual structuring and opening up of their
local Open Data Governance Groups, and in addition provided a supporting framework for the
Outreach programme. The experience gained in the project has shown that such an open and
flexible approach can remain as the modular elements with which any city can build and
consolidate their own Open Data governance model.
The final version of the Citadel Charter therefore needed to be some sort of statement that
pulls these elements together, drawing in addition on the extent and success of the outreach
activities and the awareness of the innovative potential of the Citadel vision. Rather than a
formal document of adherence to a network of Citadel-compliant cities (risking in addition to
duplicate the efforts of EuroCities, the Connected Smart Cities Network, and others), what
appeared to be most useful and needed was a declaration of common principles that can:
The text of the Citadel Charter, which appears as Annex III to this book, is therefore intended as
an open document whose primary aim is to promote the Citadel vision more than the specific
83
tools developed within the project, as a forward-looking strategic protocol that can gain the
adherence of cities around the world.
84
21
A good example was the Q&A at the session Cohesion Policy and Open Data: boosting transparency, performance
and engagement at Open Days 2014.
85
by all types of human activity as well as natural and machine events22. This fine-grained web of
data is likely to reveal new relationships between data and the specific place where it is, as
geographical, physical, and cultural elements of context become intertwined with ICT services23.
The Citadel project refers to this vision (described as its key value proposition) as a Territory of
Data. This concept implies that the density of information about a given territory leads to a
diffused awareness of all the features, activities, and dynamics happening there. This allows not
only governments to manage public services, but also businesses to understand market
dynamics, citizens to identify life opportunities, and so on.
Indeed, the Citadel project has been working to shift the Open Data paradigm from a finite set
of public administration portals towards a more territorially diffused data environment in three
main ways:
Citadel has done everything possible to break out of the technological temples of Open
Data, putting its tools in the hands of citizens.
Citadel works with cities not just as points on the map but as places where people come
together to give meaning to a place: witness the emergent role of the visitor as
explored in Citadel.
In the following, we take a look at how Citadel also aligns with some emergent trends that may
develop to unleash or at least accelerate the transition towards a Territory of Data.
FLATTENING DATASETS
A first signal we see as evidence of this transformation is what might be called the flattening of
data structures. Since the early days of information technology, data has been organized into
increasingly complex structures of inter-relationships in an attempt to more closely represent
the way data is used in a particular domain. This occurred first in nested or hierarchical
relationships, and since the 1970s in relational structures, that instead emphasize links between
simple tables, such as a listing of companies on the one hand and the addresses for each on the
other. Relational databases have since become the norm used in programs ranging from
Microsoft Access to MySQL and in fact are behind most of the web services driving many of
the open data applications we see today.
In the following diagram of a typical relational database structure, the different tables are shown
divided by logical or functional areas, with the links between specific elements in each table also
shown. These links are then used to query the database, according to different views onto the
information, i.e. a view to show a listing of all of a companys suppliers (with addresses), and
another view to show a listing of all outstanding invoices (with company name). This structure of
22
86
tables, relations and views onto the information is studied at depth with the client organization
in order to best represent their needs and operations.
If we want to publish information in such a relational database as Open Data, there are basically
two choices:
An API can be provided that essentially queries the database from the outside, with the
result being provided to the external application in the desired format. While some
systems publish to the web information about how to query them, use of an API
generally requires a knowledge of the databases structure in order to extract
information from it. In particular, it is necessary to know in advance the exact names of
fields, or in other words the semantic structure.
The owner of the database can make a query for some subset of the information
contained in the database (i.e. company names and addresses but not invoices), and
write that to a file that is then published as Open Data. This can be done either
manually, producing an Excel or CSV file, or automatically i.e. through a protocol such
as XML.
Neither of these choices, however, fully opens the database since much of it, especially its
semantic structure, remains hidden. Since the database has been structured to be a mirror of
the specific organizational context it serves a company, a public administration, etc. it can
never be fully adapted to a broader context nor can its data be seamlessly integrated into a
territorially defined web of data.
In many aspects, the LOD paradigm externalizes the relationships of such structures by recreating links between data structures as external and publicly viewable RDF triples, as BurnersLees vision in fact tends towards an evolution of the web as a database for the whole world.
This transfer of the structure of semantic relationships from inside a relational database to
outside is driving the trend towards the flattening of datasets, or in other words a preference
for working with two-dimensional tabular files. Consisting of spreadsheet-like layouts with a
87
series of rows of information using the same column headings, tabular datasets are far more
easy to read externally, especially if they are presented in an open format such as CSV. Indeed,
many on-going efforts to transform existing data structures (notably INSPIRE-based geographical
information systems) into LOD pass through the stage of first generating one or more output
datasets in tabular CSV format.
Evidence of this trend is for example the emergent standard for transport data, GTFS (General
Transport File System)24. GTFS is not actually the way data is held in transport information
systems, but rather a common interchange format, useful for telling an external service such as
Google Maps how a given citys transportation service is organized, independently of the system
used. Nonetheless, it contains all the information necessary, even though not in a format that is
immediately operational. As shown in the diagram below, a GTFS file for a given city consists of a
zipped collection of seven text files (actually structured as CSV), each of which contains a certain
part of the information that needs to be linked afterwards: for instance one contains information
on stops, another on transit lines, and so forth.
The interesting thing about GTFS is that while each of these seven files follows a precise
structure, that structure and the relationships between the data in each of the files is not
contained in any GTFS file instance, but rather in the description of the standard. In fact, the
links between each dataset are external, and they are not explicitly stated, mainly because they
are obvious: it is clear that the buses of a given line stop at bus stops. In sum, the GTFS standard
consists of seven tabular datasets, linked by externally (or socially) known relationships
between the datasets.
Another example is in the CKAN data portal software25, an open source platform that is rapidly
becoming the standard for Open Data services. CKAN primarily hosts datasets or links to data
services using a typical data portal structure (similar to the original Citadel Hub); here no
choices are made about semantic structures, only a complete listing of files based on the
relevant metadata. In addition to this File Store service, however, CKAN introduces a new
feature called the Data Store, which is very relevant to this discussion.
24
25
https://developers.google.com/transit/gtfs/
http://ckan.org/
88
The Data Store exposes any tabular dataset hosted in the File Store (or can also be set up on its
own right), in a way that it is possible to query the data inside, say, an Excel file without having
to download it, using a simple external API. As the CKAN Data Store is used ever more widely,
this tabular data format is generally gaining greater interest.
The problem of course with a tabular dataset is that CSV doesnt provide for any facility to store
metadata information about the dataset. It is a simple task (far more simple than with a
relational database) to read the first row column headings and capture the semantic
structure of the dataset, but a system for storing the links between different tabular datasets,
using an RDF file or other system, has yet to be devised. To facilitate this process some have
suggested using standard column headings (more or less what Citadel is doing, as discussed in
the previous chapter), while others are highlighting the importance of identifiers as the anchors
for linking open datasets26.
How and where to define and store information that interconnects flattened datasets is in fact a
key challenge for future research. The important point in the context of this book is that, in the
journey towards the Citadel vision of Territories of Data, the trend is to imagine a massive
number of flat, tabular datasets as the foundation.
Creating Value with Identifiers in an Open Data World, Open Data Institute and Thompson Reuters, available at
http://thomsonreuters.com/corporate/pdf/creating-value-with-identifiers-in-an-open-data-world.pdf
89
trend, and the experimentation throughout the course of the last year can give some valuable
insights as to the paths for future developments.
The Citadel AGT in fact offers the possibility for any user, and in particular non-expert users, to
generate an application using more than one dataset. In the following paragraphs, we will
explore those instances where more than one dataset is used to build an application. By
examining and classifying the associations between datasets that emerge, we can gain insights
on how a bottom-up identification of RDF triples, or any other expression that captures the
relationships between two datasets in a useful way, could occur.
In the first year of operation of the Citadel toolkit (fully operational only starting in January
2014), a sufficient number of dataset couplings has occurred to be worth investigating. As the
above diagram shows, of the 567 apps generated until end October 2014, 138 or approximately
25% include more than one dataset. The generation of multi-data apps more or less parallels
that of single-dataset apps, except in the final months27.
If we eliminate the 60 apps that use multiple datasets simply because information is coming
from different cities (either the same information in two cities or apps created only to
demonstrate the tools), we still have 78 apps to examine, or about 14% of the total of apps
generated. Further eliminating apps that are clearly demonstrations (mashing up five or six
unrelated datasets from the same city in an app with a name such as test) or that repeat the
same combination of datasets (multiple trials), we are left with 38 apps combining datasets in
an original and meaningful way.
Given the nature of the AGT as a map visualisation tool, we thus have 38 instances where users
have spontaneously created an association between two datasets as a function of their spatial
relationship. In other words, by combining multiple datasets the user is exploring some sort of
logic of the sort that the LOD model tries to express that is expected to emerge when shown
27
This can be attributed to the extensive outreach activity in fall 2014, where single-data apps have been generated
to illustrate the potential of Citadel to a new city.
90
on a map. A closer look at this sample reveals four meaningful classes of relationship each
accounting for about a quarter of the total sample that appear to motivate the couplings:
Associations of datasets: different sources of pretty much the same information are
combined to generate a more complete representation. In these cases, we can imagine
the user wishing to combine datasets as generated by different authorities into a more
complete picture that makes sense in practical terms.
o Examples: Parking lots + on-street parking; Bus stops + bike stations; Childcare
centres + schools; UNESCO Heritage sites + Tourism POIs; Historic sites +
Abandoned villages.
Temporal relationships: this class is similar to the previous one but with a clearer
sequence in time. Here we imagine the user thinking after you do this you might want
to do that. These are also relationships that are not necessarily permanent.
o Examples: Cinemas + Bars; Schools + Markets; Voting seats + Tourism POIs;
Meeting places + Planned visit; Tourism POIs + Restaurants; Museums + Bars
Urban settings: these are relationships between datasets that associate related public
or civic facilities with neither a specific logical, functional, or temporal relationship. Here
we imagine the user attempting to highlight the features of a neighbourhood in a city,
representing quality of life in spatial terms.
o Examples: Hotels + Cinemas; Parks + Community Centres; POIs + Trees; Coffee
shops + Parks allowing dogs; Sports facilities + POIs + Galleries + Parks
BACK TO LOD
As stated previously, one idea concerning the Citadel Open Data Commons is that it can provide
a way of constructing semantic LOD relationships in a bottom-up rather than top-down fashion.
Already in the early project stages, it emerged that this indeed could be the trend, although the
approaches mentioned there could be considered more as crowd-sourced labour than fully
bottom-up methods. Instead, the ODC concept already suggests a different possibility when, in
Figure 14 above, it is suggested that Semantic patterns could be identified in the Query
recordings, considered at that stage of development as a log file of activity within the ODC
containing information about the conditions in which datasets were accessed.
91
What ultimately appears to be the most promising approach is in fact to infer relationships from
the combinations of datasets as discussed above. Were the kind of activity witnessed in the first
year of use of the Citadel Toolkit to reach a massive scale of the kind justifying a big data
approach, the types of relationships tentatively suggested in the previous section could be
identified with greater certainty. At that point, however, we need to ask: is the RDF framework
appropriate and sufficient to express these relationships?
RDF, in its essence, expresses relationships in a subject > verb > object syntax, meaning that the
relationships are not just neutral associations, but they can have a meaning and a direction
associated with them.
In this logic, we can imagine that, in the Association of datasets category above, the
combination of Parking lots + on-street parking (the first example from the list in the previous
section) can be modelled as two datasets that can be the subjects with a verb provide and
object parking spaces. Indeed, this sort of descriptive relationship fits well with the LOD
scheme; the well-known example shown below in fact uses the verbs is a, is located at, is
on the topic of, depicts, is famous for, and discovered.
The other three categories, on the other hand, introduce new elements that may not be able to
be fully captured by the RDF syntax.
92
Functional relationships: as shown in the previous section, these relationships are often
contingent on certain aspects of the context of time, place, and role of the user. RDF on
the other hand only expresses permanent relationships; how to situate them then in
the context for which they hold true: where using RDF can you express as long as the
cinema is open?
Temporal relationships: the contingency here is even more complex, since it depends
on a sequence of events; might we then imagine algorithms that generate RDF triples as
temporal sequences depending on what happens when?28
Urban settings: here the combination of datasets seems to express spatial qualities that
are very related to the map but not directly related to the individual datasets taken
singly; can we imagine some new vocabulary of spatial qualities (not necessarily limited
to urban environments), for instance capable of describing a nice neighbourhood, a
city centre, or even landscapes?
These questions are by no means trivial, since they touch on the very usefulness of LOD
relationships, which, apart from some rather elementary applications, have not been tested to
date on a wide scale with citizens and businesses in city settings. From the evidence that
emerges from the Citadel experimentation, there are significant research tasks ahead in better
exploring the semantics of place, time, and space.
Make it easier for those holding data to publish it electronically and thus make it
available for access by third-party applications
Make it easier for developers to design applications that can move smoothly from city
to city, allowing citizens to access and visualize datasets independently of the format
and standards by which they were originally published.
In its starting configuration, previous to the introduction of the toolkit, the ODC was simply a
collection of static datasets published on the Citadel Platform, the first common space. In the
initial cycle of pilot testing, these datasets were incorporated into the Citadel templates for
each city, as JSON files in a relatively closed client-server framework. The open and flexible ODC
scenario was thus discussed as a possible vision or way ahead for Open Data but not
implemented in practice.
28
To some degree, one could argue that the Google Now service attempts to do this.
93
The first implementation of the Converter and AGT at first seemed to defeat the main principle
of the ODC, which was intended to remain as open as possible to different standards, data
models, etc. through a public collection of tools, not just one toolkit. On the other hand, this
solution could also be framed within the ODC concept as just a first instance of a more open
framework based on the same approach, recovering the original idea of having several app
templates each with its own data model in addition to the AGT.
In October 2013, in parallel with the specification of the tools, we thus considered the following
scenario as an extension of what was being developed. In this context, the ODC could be seen to
be made up of two main elements:
1. One or more servers containing live files accessible via their URL that contain data
in a format (platform and data structure) that at least one template (Citadel or third
party) or application (Citadel or third party) can access remotely without any further
conversions. (The first implementation of the toolkit being a first server with a first data
model for a first application, the AGT, but with nothing prohibiting further
development.)
2. A unique Index that includes a listing of a) converted live files as described in point 1,
including where and how they can be accessed, and b) the relevant template and
application data formats that are compatible with the files. (The first implementation on
the Citadel Platform consisting of an Index that lists all files that have been validated as
compatible with the AGT, the first of possibly many data formats.)
This broader scenario allowed to imagine some possible use cases that have in fact arisen
throughout the pilot testing and outreach activities as concrete situations. They also illustrate
the broader potential of the ODC, taking Citadel far beyond the paradigm of cities publishing the
typical datasets into a city portal:
94
Restaurant menus
A citys local restaurant association might find it useful to be able to publish the menus
and special offers of member organizations on a daily or weekly basis. They could
therefore commission a special template that can display restaurant menus on the city
map, together with a special version of the converter that converts to the new data
model. Individual restaurants and pubs that publish their information according to the
agreed format will then be visible through the AGT version that incorporates the new
template.
The broader ODC scenario also allowed us to define a roadmap for development of the
Converter + AGT tools in the direction of realizing the open and interoperable vision of the ODC
in an incremental fashion. In this mature scenario, all the datasets in the ODC are published as
JSON files compatible with one or more data models, so in theory a template or application (or
enhanced future version of the AGT) only needs to know which cities have published data in the
expected format and what URLs should be used. This information is stored in the Citadel Index,
so that developers can easily configure the templates they incorporate into a given application
and be sure that it will work in the different cities29.
In this context, the following were identified as possible areas for development in late 2013.
(Followed by a note on what actually happened.)
Index logging
This consists in logging events that happen through the Index, namely new datasets registered
(by whom and how), accesses by templates, configurations used, etc. This was thought of as
likely to yield very useful information for both the template and application developers as well
as for the data providers. In addition, further services (eg. privacy management or semantic
tracking) could also be built into the system that manages the Citadel Index. (Basic logging has
been implemented with the Citadel Index, and the potential for both privacy as a service and
semantic analytics have been identified in other ODC reports. Further developments are a good
topic for future work.)
Converter enhancements
An important enhancement to the Converter would be to enable on-the-fly conversions. In this
case, rather than generate and save a new file to the ODC, the Converter would save the
configuration info only, directly accessing the original dataset (or a regularly copied data dump)
on the fly. (On the fly conversion has in fact been implemented as a proof-of-concept script with
the PHP Converter Library. A future enhancement of the Converter could include saving semantic
mappings for batch processing, although actual effectiveness in practice would need to be
tested.)
Template developments / enhancements
As shown in the restaurant scenario above, the development of new templates was considered
an important space for the future, in order to visualize types of datasets that go beyond the
Citadel application scenarios, ie. socio-economic data. A new template would simply need to be
29
This feature of discovery of thematically compatible datasets in different cities (though using the same data
model) has already been implemented for the AGT.
95
registered in the (future version of the) Citadel Index, with information on where it resides and
the platform and data format it uses. (Currently, the AGT uses only one data model that includes
all fields from the POI, parking, and event templates. A more modular design for the AGT,
capable of incorporating different templates according to the selected datasets, could be a
future objective. As it stands, the additional data models that have been implemented for the
Converter are related to specific applications outside the AGT, as with the MyNeighbourhood
data model30 in the Lisbon Pilot, see below).
Dataset enhancements
With the ease of access to datasets through the Converter and AGT, it is possible for anyone to
publish open data to the Citadel Platform, eg. a listing of bridge club meetings, and immediately
see them on a map on their mobile phone. This has great potential for motivating the
publication of datasets beyond those of the municipality. In parallel, by shifting the emphasis
from applications to data visualization, it can make sense for a city to enhance existing datasets
rather than developing specific apps. For instance, in order to make a reservation for a concert,
it can be possible to embed a link (which would then be visible via the App Generator) to an
external reservation service in the description text of the concert rather than building
reservation functionalities into the app. (Most of the datasets to date in Citadel contain city
information, with citizen datasets e.g. Lisbon pastry stores, appearing only recently. In some
instances, however, enhanced datasets have been experimented, as with the Museum tour app
in Ghent.)
Dataset refinement
As the Converter can access any dataset on the condition that it is refined, a broad, deprofessionalised uptake of open data as suggested above was expected to bring this topic to
the fore very quickly. Although there exists a broad range of tools and toolkits to help refine
large datasets, there is little awareness of them or diffused expertise on how to use them. Pilot
responsibles were suggested to engage developers with data owners in order to teach them
how to use these tools. The best strategy, however, is to build awareness from the start,
something which can be achieved by eg. helping people to publish bridge club meetings and
then seeing what happens when the address format is not consistent. (Dataset refinement has
played a lesser role than expected, in part because many of the Citadel datasets were made
from scratch. Nonetheless, the toolkit has had a powerful impact of raising awareness on data
quality, and the Apps4Dummies workshops effectively put into practice the recommendation of
mixing developers with data owners.)
30
http://my-neighbourhood.eu/
96
emerging. This process started in early April 2014 and continues to date, feeding as well into the
public debates on Open Data mentioned above.
Indeed, one of the central and most transformative tenets of the Open Data Commons concept
is the simple idea that Open Data be considered as a common good, in a public sphere whose
stewardship is to the benefit of both public and private stakeholders as well as citizens. The
mainstream paradigm for Open Data, especially as promoted by technology providers,
essentially ignores this common space, instead identifying a two-step process:
Although the relatively cautious uptake of Open Data across European cities is often attributed
to a lack of a culture of transparency or concerns about privacy and sensitive information, this
is not sufficient to explain the lack of information regarding topics such as the location of
galleries, museums, and public toilets. We suggest that three other factors inherent in the
current paradigm might also be identified as creating barriers:
The two-step process governments open data and then sit and wait for developers to
come along and use it creates a discontinuity between supply and demand and a
specialisation of roles that inevitably makes it difficult to engage different points of view
in defining comprehensive Open Data strategies.
This separation of roles also affects the propensity of the actors involved to engage with
similar initiatives under way in related fields, such as the EUs standardisation efforts in
Spatial Data Infrastructures (INSPIRE), Sustainable Energy Technologies (SETIS), etc. This
inevitably leads to interoperability issues both along the data value chain and across
thematic sectors (a key issue to attain the Linked Open Data vision).
These factors together greatly limit the applicability of the Open Data paradigm as it is
today to the few cities who meet the profile of having a strong political will, a culture of
transparency, IT staff capable of managing data publishing, and ideally an active
developers community willing to develop apps.
The Open Data Commons concept directly addresses these issues, by politically identifying the
space between datasets of whatever form and applications of whatever type and declaring
that space to be within the public domain and in the public interest, following the paradigm of
the public Commons. This space belongs neither to data providers nor to data users, but is a
neutral domain containing the tools and knowledge allowing providers and users to connect in a
more nimble, efficient, and innovative way than either could achieve by themselves. The Open
Data Commons can thus be said to include any software element that is generic enough to have
relevance for more than one dataset on the one hand, while independent of the market
97
exploitation potential of a given application or set of applications on the other. Such elements
while the Citadel Converter is central to defining this space, it can also include generic APIs,
convertors, transformers, and tools of various nature collectively bridge the gap between
datasets and applications in the most dynamic and flexible way possible.
POLICY IMPLICATIONS
This has a direct impact on the three barriers identified above as follows:
By unlocking the technical paradigm, the ODC concept allows for data to be exploited
before standards have been fully defined, thus promoting demand-driven processes of
standards convergence and adoption. This not only brings forward the benefits of Open
Data, but it also opens up to greater interoperability flows with other standards
formation processes. This is particularly important in areas where standards adoption is
relatively immature, such as spatial data infrastructures, IoT sensor network
architectures, big data analytics, system dynamics modelling, etc.
By filling the gap between data supply and demand and creating a concretely testable
end-to-end process, data owners can see the purpose of publishing data and application
developers can see the need for new datasets with greater clarity. A common space for
on-going dialogue and interaction between governments and developers is created,
allowing for new dynamics to emerge such as application driven data strategies.
By allowing the introduction of simple tools such as the Citadel Converter and AGT, the
ODC substantially de-professionalises the practice of Open Data, opening up to a full
and active participation of citizens and local businesses in both data supply and demand
and a potentially massive uptake of data-based activities. This also allows for a more
diffused territorial impact of Open Data, no longer confined to large, well-to-do, and
innovative cities but opening up to wide-scale engagement and collective creativity.
At the broader policy level, the ODC concept transforms data into a key element of territorial
capital and its stewardship an essential activity in the public sphere, in an emerging policy
landscape in which the public sector is re-defining its role in a transformation from command
and control to the orchestration of collective and collaborative innovation processes. These
broader policy implications can be mapped onto the pillars of the EU 2020 strategy as follows:
98
Smartness: Data gains a new status as a driver of economic development, with value
streams emerging from the production, analysis, and coupling of diverse and diffused
datasets as produced by social and economic activities themselves. A data-driven Smart
City concept can lead to data-driven local and regional development strategies, that
broaden the scope of Open Data to include public, private, citizens, and businesses, as
well as nature and machines, as data producers, owners, and users.
Sustainability: Ecosystem-based management concepts for sustainable development
depend on knowledge and awareness of the current and potential dynamics embedded
in a territory and its natural and human capital. The Open Data as a common good
approach can interlink with paradigms such as the IoT-based wisdom of the earth
concept to underpin an integrated and dynamic vision of sustainability.
Inclusiveness: The ODC concept proposes data as a basis for an emerging model of
citizenship, in which data as a right and the stewardship of personal and collective data
as an activity in the public sphere. By democratising access to the operational workings
of Open Data processes, the ODC unleashes the creative potential of all parts of society
on an equal footing of opportunities.
31
A good example of this is the Apps4Dummies workshop held in Palermo in July 2014, in the context of the signing
of the Ventimiglia Pact, a joint strategic agreement among 52 city governments in the area. Among the objectives
of common interest is listed Open Data and smart city services, but the question immediately arose as to who should
manage the platform.
32
As stated previously, the enhancements to the Converter in the Lisbon Pilot were funded by the FI-WARE
programme. The policy reflections related to FI-WARE and the ODC are instead in the sphere of the Citadel Project.
33
http://www.fi-ware.org/
99
portability of applications and tools built on the Citadel platform and thus a richer business
ecosystem for the Citadel development community.
In this way, the ODC vision of open data as a common good extends beyond the specific
platform architecture of the Citadel toolkit to include a platform vision with far greater scope.
The issue that then remains is the implementation of FI-WARE as an open public facility in a
given territory, rather than as the highly access-controlled cloud platform supporting the
European ICT industrys applications and services as it is currently conceived. This brings us to
the relevance of the Digital Agenda, the policy initiative supporting the development of
connectivity and service infrastructures across Europe, and the way it is implemented through
regional ERDF policies, where most of the funding is to be found.
In fact, in the 2007-2013 programming period, R&D&I represented some 26% of the planned
expenditures for Structural Funds, for a total of over 86 Billion Euro (more than the FP7 and CIP
programs together). With several key EU 2020 Flagship initiatives (e.g. Innovation Union, the
Digital Agenda) pinpointing regional policy as the main instrument for implementation, this
figure is likely to rise even further. The set of 271 Regional Operational Programs currently being
drafted therefore represent an important opportunity for the Citadel ODC vision.
The new conditionalities for this cycle of Regional programming in particular so-called Smart
Specialisation model for innovation strategies impose certain principles and processes for
each region, such as stakeholder engagement, entrepreneurial discovery, and the integration
of social innovation. These are in turn leading to significantly new policy approaches for many
Regions, particularly in Southern Europe and the New Member States, very much in line with
the human approach to technology innovation also shared by the Citadel project and
supported by many initiatives in DG Connect. To support the new policy process, DG Regio has
engaged the Commissions IPTS in Seville to advise and coordinate individual Regions in
complying with the requirements for Smart Specialisation, but despite their efforts the Digital
Agenda has yet to appear on the top of the policy agenda with any degree of sophistication.
The Digital Agenda Toolbox, one of many instruments designed for this purpose, provides
guidelines as regards regional ICT infrastructures, services and applications, and methods for
take-up and digital literacy, and even includes a section on Living Labs, the methodology
adopted in Citadel. Cloud platforms and Open Data are also mentioned, but in relatively
traditional terms compared to the Citadel ODC concepts mentioned above, In addition, no
mention is made of either FI-WARE or the FI-PPP, despite the fact that the Commission has
already funded FI-WARE with over 800 Mln. Euro. This is evidence of the low level of awareness
among Regional policy makers responsible for Smart Specialisation of the possibilities that could
be offered by a wide-scale uptake of the FI-WARE cloud together with the CKAN Open Data
service, the underlying infrastructure onto which the Citadel toolkit would ideally be integrated.
This situation can be attributed to barriers on both sides of the equation. On the one hand, FIWARE and many FI-PPP services are still in an experimental stage and not ready for commercial
launch. On the other, ERDF regulations make programming and assignment of funds a long and
cumbersome process that often misses windows of opportunity in a fast-changing sector. Yet
these difficulties are perhaps hiding the real potential and benefits to be gained by
implementing the FI-PPP at the regional scale, especially as framed by the Citadel ODC vision.
100
While the rollout and provision of broadband can proceed within the context of existing
regional innovation strategies and traditional tenders, cloud services and the ODC concept
overall instead raise a whole series of new questions and opportunities.
With the funding available for the Digital Agenda, this is a potentially very important part of the
business opportunity for Citadel as well as FI-WAREs cloud services. Yet implementation at the
territorial scale is not only a technical issue: who should ensure data policies across
administrations, guaranteeing openness and citizen engagement through governance in the
public interest, and who should ensure quality of service, privacy and security, and
interoperability among platforms and systems? What are the potential benefits of a panEuropean approach in terms of the business opportunities for local ICT SMEs working with a
common information infrastructure across and between their regions (or, why not, at macroregional space level)? And as a consequence, what is the right approach for procurement on the
public side34?
These issues cannot be solved in the abstract but should most effectively be addressed through
a vast and diffused co-design process that engages regional actors and authorities engaged with
the Citadel toolkit together with the FI-PPP as a whole. Since this is essentially an exploratory
process involving large scale (though not necessarily heavy) pilot experimentation, it could be
framed in the context of Pre-Commercial Procurement or the EU Public Private Partnership, i.e.
as a shared-cost experimentation whose principal actor can be the Commission itself (in terms
of defining the framework guidelines linking H2020 to the ERDF). How such an infrastructure, if
successfully tested, can be then implemented in practice could then be integrated into by-then
ongoing ERDF-funded activities.
At a more manageable level, the FI-WARE Accelerator programme provides the opportunity to
further develop the Citadel toolkit and enhance its integration with CKAN and other relevant FIWARE Generic Enablers. A series of sixteen projects are launching calls for SMEs to propose ICT
services to be developed using the FI-WARE platform, and the pilot testing in these initiatives
can include scenarios of use with neighbouring municipalities, as a proof of concept of some of
the more functional aspects of the ODC vision.
In parallel, however, the bottom-up exploration of Citadel as an innovation support
infrastructure can be equally carried out from a regional policy perspective, for instance through
discussions with the IPTS and interested Regions or collaboration in the framework of on-going
European Territorial Cooperation projects. The feasibility of this is illustrated by the diffused
response to the Citadel Associate outreach program, providing a bottom-up platform of
interested cities. In addition, the CreativeMED project35 has been exploring possible areas of
concrete exchange of thematic and operational knowledge among 12 Mediterranean Regions
for the implementation and monitoring of their 2014-2020 Smart Specialisation strategies. Here,
the hypothesis of concretely experimenting the ODC Territory of Data vision using the Citadel
toolkit on top of the FI-WARE cloud (or at least using CKAN) is being explored by regional
programming responsibles in Portugal, Italy, Slovenia, Greece and Cyprus. Similar concepts are
34
In fact the new framework for EU public procurement creates more room for informal negotiations with
prospective awardees.
35
http://www.creativemed.eu/
101
also the subject of specific discussions with CORVE, the Citadel Lead Partner, as concerns the
Flanders Region.
102
REFERENCES
[1] All Citadel deliverables, including those mentioned in this book, are publicly available at:
http://www.citadelonthemove.eu/en-us/results/publicdeliverables.aspx
[2] Article 29 Data Protection Working Party (2013) Opinion 2/2013 on apps and smart devices, 27
February. Available online at http://www.huntonprivacyblog.com/wpcontent/files/2013/03/wp202_en.pdf (last accessed: December 2014)
[3] Article 29 Data Protection Working Party (2011) Opinion 9/2011 on the revised Industry Proposal for a
Privacy and Data Protection Impact Assessment Framework for RFID Applications, 11 February. Available
online at http://cordis.europa.eu/fp7/ict/enet/documents/rfid-pia-framework-a29wp-opinion-11-022011_en.pdf (last accessed: December 2014)
[4] BEPA (Bureau of European Policy Advisers) (2011) Empowering people, driving change: Social
innovation in the European Union. Luxembourg: Publications Office of the European Union. ISBN 978-9279-19275-3
[5] Capability Maturity Model (MATURITY MODEL),
http://en.wikipedia.org/wiki/Capability_Maturity_Model (last accessed: December 2013)
[6] Dekkers, M., Polman, F., te Velde, R. and de Vries, M. (2006) Measuring European Public Sector
Information Resources. Final Report of Study on Exploitation of public sector information benchmarking
of EU framework conditions. Executive summary and Final Report. European Commission, Directorate
General for the Information Society and Media
[7] European Commission (2009) Recommendation on the implementation of privacy and data
protection principles in applications supported by radio-frequency identification, C (2009) 3200 final,
Brussels, 12 May. Available online at: http://eurlex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32009H0387:EN:HTML (last accessed: December
2014)
[8] Ferro, E. and Osella, M. (2011) Modelli di Business nel Riuso dell'Informazione Pubblica. Studio
Esplorativo. Osservatorio ICT Piemonte, www.sistemapiemonte.it
[9] ISO/IEC WD 29134 Privacy impact assessment Methodology. Available online at
http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=62289 (last accessed:
December 2014)
[10] Itani, W., Kayssi, A. and Chehab, A. (2009) Privacy as a Service: Privacy-Aware Data Storage and
Processing in Cloud Computing Architectures. In Proceedings of the Eighth IEEE International Conference
on Dependable, Autonomic and Secure Computing (DASC '09), 12-14 December, pp. 711-716. Available
online at http://dl.acm.org/citation.cfm?id=1724449 (last accessed: December 2012)
[11] Jacobsson, S. and Bergek, A. (2007) A framework for guiding policy makers intervening in emerging
innovation systems in 'catching up' countries. European Journal of Development Research, (18), 4, 687707
[12] Maximilien, E.M., Grandison, T., Sun, T., Richardson, D., Guo, S. and Liu, K. (2009) Privacy-as-aService: Models, Algorithms, and Results on the Facebook Platform. In Web 2.0 Security and Privacy
Workshop, held in conjunction with the 2009 IEEE Symposium on Security and Privacy, 21 May. Available
online at http://w2spconf.com/2009/papers/s4p2.pdf (last accessed: December 2012)
103
104
The specific code for each of these resources is available through Github, so in the following we
simply provide an overview of how each element is structured and how it fits into the ODC
concept.
The Library: this is the heart of the converter, and carries out the actual conversion
function and makes it available to external parties thanks to its APIs
The GUI standalone: this is the graphical interface for those wishing to use the Library
off-line.
The Portlet: this is the portlet installed in a Liferay Portal to use the Library via web (and
integrated into the Citadel platform).
THE LIBRARY
Github: https://github.com/CitadelOnTheMove/converter-lib/
Wiki: https://github.com/CitadelOnTheMove/converter-lib/wiki/
105
Github: https://github.com/CitadelOnTheMove/converter-gui/
Wiki: https://github.com/CitadelOnTheMove/converter-gui/wiki/
Step by step User Guide with pictures:
https://github.com/CitadelOnTheMove/converter-gui/wiki/User-Guide
Video Guide: https://github.com/CitadelOnTheMove/converter-gui/wiki/Video-guide
The main features of the Converter GUI includes the features of the Converter library and make
it easy to:
106
Visual indication of different kinds of messages (notice, warning and error) on data
mapping
Display error message boxes in the conversion process or on validation
Preview the generated target dataset
Save the generated dataset locally
Github: https://github.com/CitadelOnTheMove/converter-portlet/
Wiki: https://github.com/CitadelOnTheMove/converter-portlet/wiki/
Step by step User Guide with pictures:
https://github.com/CitadelOnTheMove/converter-portlet/wiki/User-Guide
Video walk-through: http://youtu.be/oTn76MqzuG4
The main features of the Converter Portlet include the features of the Converter library and
make it easy to:
USAGE STATISTICS
The following provide some statistics on access to and use of the Converter starting April 24,
2014 (installation of a significantly revised version following user feedback) and ending
December 12, 2014:
Total of 603 user sessions (persons initiating a conversion process for at least one
dataset), average of 2.6 sessions per day
Datasets loaded: 814 (386 CSV, 375 XLSX, and 53 XLS)
107
Datasets successfully converted: 627 (606 Citadel JSON and 21 MyNeighbourhood CSV)
It can carry out on-the-fly CSV, geoJSON and osmJSON to Citadel JSON conversions for
mobile applications
Conversion from osmJSON format enables to get live data from Open Street Map
GeoJSON export format enables to use the converted data into other geoJSON
compatible applications (including web mapping services)
It includes a mapping template editor, for easy generation of config files, which enable
live encoding from various datasets
It provides some converted data caching (so we update the file only when requested, or
depending on some specific criteria, allowing to serve converted files faster, while
updating them on a regular basis)
It is designed to be embedded into other Open Source products, such as CMS or Data
stores, to allow them to natively provide Citadel JSON output.
Github: https://github.com/CitadelOnTheMove/converter-php-lib
The code is mainly intended as a basis for more advanced projects, with the following overall
roadmap.
Implement the complete set of data fields from the Citadel-JSON format
Add an editor feature to allow using various fields into the output description field for
POI
Plug the library to other data sources than CSV files, and particularly database backends
from existing CMS.
36
Github: https://github.com/CitadelOnTheMove/CitySDK-Citadel-Script
CitySDK is a sister project to Citadel. It developed standard APIs for Open Data webservices to allow portability of
apps across Europe.
37
Development of the CitySDK Citadel conversion script was carried out without resources from Citadel. The first
script was developed with resources from the CitySDK project, while its refinement was carried out as part of the
Lisbon FI-WARE Pilot.
108
During the Lisbon pilot, the script was further refined to allow accessing the database (with
parameters, filters, etc.) through a URL query such as
http://citysdk.ist.utl.pt:8000/?city=amsterdam&format=csv&limit=5.
Github: https://github.com/rsbarata/CitySDK-Citadel-Script
In parallel, an additional feature was added to the Citadel converter** that allows to upload a
file through a URL field rather than by browsing and selecting. This has allowed to add to the
Citadel Platform datasets with tourism POIs from Lisbon, Amsterdam, Helsinki, and Rome
through direct queries to the CitySDK API.
109
110
Format
Description
Pros
Cons
.XLS/.XLSX
(Excel)
Very accessible to
people and widely used
.CSV
Easily understood or
parsed by most
programmes. Easily read
by humans. Applicationneutral.
.XML
Represents data as a
structured tree schema that
expressed relations between
data
.JSON
RDF
Formats
(e.g Turtle,
Represents data as a
network of linked points
that make it easy to
Highly-structured
information makes it
easy to search and
Description
Pros
Cons
N-Triples
and JSONLD)
understand complex
patterns using computer
programmes
retrieve information in
services. Easy to visualise
data relationships
Based upon the above mapping and in alignment with the trends outlined above, Citadel
ultimately chose to use two types of files formats in its own work out:
38
Some of the Citadel converting tools also accepts Excel, OSM JSON or geoJSON files
http://www.opendataimpacts.net/2014/10/data-standards-and-inclusion-in-the-network-society/
40
CSV files can be read and edited by humans, using a simple text editor
41
Huge CSV datasets can be easily handled, as the one line per entry structure enables sequential processing
42
http://blog.mongolab.com/2011/03/why-is-json-so-popular-developers-want-out-of-the-syntax-business/
43
http://www.w3.org/TR/poi-core/
39
112
2. Cross-Border Use A Citadel JSON file will work with any other application designed
using Citadel JSON. This means that an app developed to find art galleries in Helsinki can
also find galleries in Palermo with no need to develop and download a new service.
The two features above make Citadel JSON a significant improvement, from the perspective of
developers over existing Open Data models. The following visual provides an overview of the
Citadel JSON data model:
The Citadel JSON data model was a significant extension of the W3C PoI Core Draft, currently
the global guideline for the production of PoI Data Models.
44
45
http://www.w3.org/2010/POI/documents/Core/core-20111216.html
JSON is the native data format used by JavaScript, which is responsible for the dynamic part of the applications
113
GEOSPATIAL STANDARDS
Citadel JSON uses the WGS84 coordinate reference system to represent the location of points of
interest accurately. WGS84 is used by GPS providers and most well-known mapping systems
and can be considered the world standard. The coordinates of a given point on Earth are
expressed into decimal format, using axis order latitude, then longitude, and separated with a
space, e.g. 50.838908 4.373942 for the European Parliament building in Brussels. The Citadel
JSON conversion process also uses latitude and longitude fields to produce the Citadel JSON file.
The resulting format in Citadel JSON combines these two fields with a separating space,
resulting in a single value with latitude first, then longitude.
The Citadel format also allows other standards to be declared and used, though this is not
recommended and not handled by the mobile templates46.
114
METADATA
Metadata, or data about data, is structured information that describes, explains, locates, or
otherwise makes it easier to retrieve, use or manage an information resource.54 The reference
metadata standard to describe online resources is Dublin Core Metadata,55 from which 15 core
terms56 have been normalised in ISO 15836:2009.57 Citadel chose to conform to this widelyrecognised international standard - all Citadel data therefore uses Dublin-Core Metadata.
The Citadel Data Index61 uses a different, narrower categories list as it is more convenient for
general public POI categorization. This classification is available using a JSON implementation of
RDF Data Catalogue standard (DCAT)62 through the dataset web service of the Open Data Index.
The categories used inside the datasets themselves are free because they reflect the ones
used in the original data file which may or may not be structured. While not enforced at all,
Citadels use of existing categorization vocabularies can be a step forward toward better
interoperability. It would allow better dataset auto-discovery in the future, and is therefore
advised.
52
http://www.iso.org/iso/catalogue_detail.htm?csnumber=32574
Which uses Xively API, which was used by 2 pilot cities, fetching data as JSON : https://xively.com/dev/docs/api/
54
http://en.wikipedia.org/wiki/Metadata_standards
55
http://dublincore.org/
56
The 15 core terms are : title, creator, subject, description, publisher, contributor, date, type, format, identifier,
source, language, relation, coverage, rights
57
http://www.iso.org/iso/fr/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=52142
58
http://www.iso.org/iso/fr/home/store/catalogue_tc/catalogue_detail.htm?csnumber=53798
59
http://inspire.ec.europa.eu/theme/
60
http://www.eionet.europa.eu/gemet/en/themes/
61
http://www.citadelonthemove.eu/en-us/opendata/opendataindex.aspx
62
http://www.w3.org/TR/vocab-dcat/
53
115
Some popular spreadsheet editing software including Microsoft Excel still uses regional
character encodings by default
This choice leads to less-interoperable files, with accents and special characters being
misinterpreted
Such regional encoding can cause challenges for the conversion process including
unreadable accent characters in files
Citadel Recommendation:
Adoption of the UTF-8 standard should be ensured so that text data and information
can be exchanged in an interoperable manner throughout Europe and across the world.
Access to state-of-the-art standards and their implementation into varying languages
should be free wherever possible.67
POI ISSUES
As W3C did not finalise a standard for POI, Citadel had to use the unfinished draft. Use of this
draft uncovered a number of issues:
63
Some fields did not suit flat files,68 and were simplified to become more user-friendly,
116
Citadel needed to describe the dataset itself, and extend POI drafts to add fields that
describe the dataset these fields basically wrap the updated POI data,
Citadel needed to add fields specifically designed to be used with mobile applications,
which were implemented through an additional extensible data model (the tpl
identifiers),
The draft W3c standard did not go far enough - real-world implementation revealed a
need to build usable tools without using a full linked data infrastructure.
Citadel Recommendation:
EVENTS ISSUES
The W3C POI data standard did not include calendar information by default which made it
impossible to properly display events on the map.
Citadel Recommendation:
Citadel added to POI data using the extendable attributes defined in the Citadel JSON
format.
GEOSPATIAL ISSUES
Geospatial standards still are a fuzzy standards area. Many reference systems exist with no clear
guidance on which ones are best to use and why.
In our work, Citadel found that geographical coordinates from different countries had variations
in both axis order (whether latitude or longitude comes first when written) and the form in
which Lat/Long were written (one cell or two separate cells). Citadel also found that geographic
coordinates are based on various geospatial reference systems which often leads to
inconsistency between datasets as they are not always explicit (especially once used on table
files). As an example, Barcelona which has published many rich data sets offers a bus stop
dataset, which even after proper conversion to the global standard for Latitude and Longitude
(Wgs84), shows a small shift (about 2 blocks) for all POIs - making it concretely unusable for
Citadel apps.
Conversion between geographic coordinate systems remains a complex issue for non-GIS
specialists. We believe this complexity has contributed to a lack of easy-to-use available tools
and best practice on used geospatial reference systems. Finally, the auto-discovery feature of
the Citadel AGT, which allows any app to automatically detect data corresponding to ones
current location and load it into the app, shows that there is a key need to be able to describe
the covered area of a given dataset (instead of attaching it to a central point) in order to enable
applications to get the most accurate data at different scales depending on their completeness.
68
They are rather designed for Linked Data, using extensive URI and namespaces instead of clear text and URL, which
are more user-friendly for developers and data editors that lack the surrounding infrastructure to easily produce
these structured and linked data files.
117
118
Set a central open interoperable standard for geographic coordinates based on WGS84
and a once defined axis order and coordinates formatting
Provide adequate and open conversion tools to enable data publishers to publish their
data using a shared, unique coordinate system to exchange information outside from
the GIS community
Establish an administrative ontology of European boundaries, at scales and with a
historical perspective, in order to enable local data naming and discovery using both
geographical coverage and administrative entities
Where possible, cities should geocode their POI in latitude and longitude fields.
69
http://ec.europa.eu/digital-agenda/sites/digital-agenda/files/ministerial-declaration-on-egovernment-malmo.pdf
Standard Eurobarometer 81, http://ec.europa.eu/public_opinion/index_en.htm.
71
http://www.corve.be/docs/english/Citadel%20Statement.pdf
70
119
momentum, a specific outreach initiative has enlisted over 100 local authorities in over 60
countries worldwide sharing the Citadel vision.
120
121
Citadel has focused on inclusive local governance of Open Data strategies as key to identifying
the key legal and procedural enablers that city administrations are able to set in place for
diffused uptake in their territories, including awareness building, public events and hackathons,
etc. Broader engagement of citizens and businesses in debates on issues such as privacy and
security are fundamental in order to provide effective bottom-up input to national and EU legal
frameworks. Finally, the need has emerged for the provision of common Open Data platforms
and tools as an enabling public service open to all, governed by the principles of an Open Data
Commons.
Thanks to the experience gained and the lessons learned in the Citadel project, the key
elements for achieving the Malmoe Declarations important objectives are in place and the way
forward is clear, on the eve of the target date of 2015. Now is the time for local authorities to
join forces by declaring these common principles, committing to common action, and
challenging others to play their part.
122
123
124
Reflect, both individually and collectively, on emergent issues of personal privacy and
identity, recognizing the key role for citizen engagement in designing new societal
frameworks of entitlement and citizenship.
Demand openness and transparency from governments and businesses at all levels, as
the prerequisite for gaining the trust required to work together in addressing the key
problems society faces today.
125
126