You are on page 1of 12

Information Polity 19 (2014) 5–16 5

DOI 10.3233/IP-140328
IOS Press

Big data, open government and


e-government: Issues, policies and
recommendations
John Carlo Bertota,∗ , Ursula Gorhama , Paul T. Jaegera , Lindsay C. Sarina and Heeyoon Choib
a Information Policy & Access Center, University of Maryland, College Park, MD, USA
b Korea Institute for Science & Technology Information, Daejeon, Korea

Abstract. The transformative promises and potential of Big and Open Data are substantial for e-government services, openness
and transparency, governments, and the interaction between governments, citizens, and the business sector. From “smart” gov-
ernment to transformational government, Big and Open Data can foster collaboration; create real-time solutions to challenges in
agriculture, health, transportation, and more; promote greater openness; and usher in a new era of policy- and decision-making.
There are, however, a range of policy challenges to address regarding Big and Open Data, including access and dissemina-
tion; digital asset management, archiving and preservation; privacy; and security. After presenting a discussion of the open
data policies that serve as a foundation for Big Data initiatives, this paper examines the ways in which the current information
policy framework fails to address a number of these policy challenges. It then offers recommendations intended to serve as a
beginning point for a revised policy framework to address significant issues raised by the U.S. government’s engagement in Big
Data efforts.

Keywords: Open data, big daga

1. Introduction

When Barack Obama was a candidate running for the presidency the first time in 2008, his campaign
devoted an unprecedented amount of focus to issues related to information and technology. The Obama
campaign not only used technology – particularly social media – in new ways to raise money, target
and contact voters, and get out the vote but also devoted a considerable amount of attention to the ways
they would use technology in governance and the policies they would support [15]. Immediately upon
taking office, Obama issued two executive orders requiring government agencies to err on the side of
openness when considering Freedom of Information Act (FOIA) requests for government records and
opening presidential records to the public. More recently, the Obama Administration has pushed open-
ness through the Open Government Partnership [21] and by requiring the release of machine readable
datasets [20]. The overarching technology focus of the Obama administration has been on the use of
technology to attempt to increase government transparency, or at least increase access to the volume
of government information available [2,12]. This follows an overall trend in recent years toward using
e-government for greater access to government records and increased focus on proactive release [9].


Corresponding author: John Carlo Bertot, 4105 Hornbake Building, South Wing, College of Information Studies, University
of Maryland, College Park, MD 20742, USA. Tel.: +1 301 405 3267; Fax: +1 301 314 9145; E-mail: jbertot@umd.edu.

1570-1255/14/$27.50 
c 2014 – IOS Press and the authors. All rights reserved
6 J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations

Open Data policies and programs are at the center of the Obama administration’s efforts to promote
access, openness, and transparency [3,4,13,23]. Open Data is based on the idea that certain kinds of data
should exist beyond the limits of copyright, patents, censorship, or other parameters often placed around
data. Data is disseminated openly so that it is freely available to use, republish, and transform into new
products. In a government context, Open Data creates opportunities for individuals, private sector orga-
nizations, and non-profits to find new insights in and create new products and services from the data [2,
3]. An Open Data set can be of any size; however, many of the key Open Data initiatives championed by
the Obama administration have involved Big Data – datasets that are extremely large and/or complex,
offering the possibilities of identifying previously impossible levels of insights, granularity of analysis,
and relationships between elements in the dataset. Big Data sets have become possible due to recent
increases storage and processing capacity, as well as increases in the number of devices collecting and
sharing data. Big Data require three key infrastructure ingredients: 1) a platform for organizing, storing,
and making data accessible; 2) computing technology and power that can process large-scale datasets;
and 3) data formats that are structured and usable. As they are typically so large that they exceed the
capabilities of personal computing, Big Data sets generally are stored on large numbers of servers. More-
over, Big Data span a range of data types such as text, numeric, image, video, or combinations thereof,
and they can cross multiple data platforms such as social media networks, blog files, sensors, location
data from smart phones, digitized documents, and photograph and video archives.
Big Data are being widely used by governments to identify and analyze problems, as well as to make
data publicly available. Tools such as www.data.gov have been developed to provide direct access to
enormous amounts of unrefined government data with the hope that the visitors to the site will find new
uses for the data and that these new uses for the data can create previously unavailable insights into
government activities and larger societal issues. Data.gov is but one example of an open government
initiative using Big Data (referred to as “Big and Open Data” throughout this paper). The Open Data
philosophy is foundational for Big Data to succeed, as it ensures publicly accessible datasets through
managed processes. Big and Open Data initiatives have the potential to lead to new scientific and re-
search insights, create economic development, inform decision and policy making, and generate new
policies that benefit the publics served by governments. The policy choices related to Big and Open
Data will have long term implications for innovation and research using large-scale datasets, govern-
ment openness and transparency, and many other contexts. Such policy decisions will involve balancing
the questions of access, privacy, security, and digital asset management, archiving, and preservation,
among others.
This paper reviews current government interaction and involvement with Big and Open Data in the
United States, examines the extent to which the current information policy framework addresses issues
raised by Big and Open Data, and sets forth a number of key recommendations that lay the groundwork
for the development of a government model for Big and Open Data.

2. Big and open data initiatives in the Obama administration

The notion of Big Data in the United States, particularly government data, is not new. Whether in print
or electronic form, the U.S. government has collected and released a wide range of data, publications,
and content in the name of transparency and openness. Indeed, a core principle of the founding of the
U.S. is access to and dissemination of information about government [14]. Over the years, government
information and data have evolved, and so too have the methodologies and approaches to collecting and
disseminating government data. Some milestones in the U.S. include:
J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations 7

– The use of punch cards and an early version of computing technology to tabulate the 1890 census
data (https://www.census.gov/history/www/through_the_decades/overview/1890.html).
– The launching of Social Security as part of the Social Security Act in 1935, which required a large-
scale data collection effort to collect data from 26 million workers and 3 million employers. IBM
received the contract to undertake this initiative (http://www.ssa.gov/history/briefhistory3.html).
– The coining of the term “Big Data” by Cox and Ellsworth [8], both NASA researchers, in reference
to large scale datasets regarding simulations of airflow around aircraft – datasets that were large and
difficult to analyze and process due to computing technology limitations.
Thus, while one can trace the evolution of Big Data for over 100 years in the U.S., the Obama admin-
istration’s commitment to open government policies and programs, at a time when the overall approach,
scale, and confluence of technologies governing Big Data has been changing, gave rise to the Big and
Open Data initiatives that are the subject of this paper.
On his first day in office, President Obama committed his administration to an “unprecedented level of
openness in Government” by adopting three guiding principles of openness [19]: 1) Transparency: Agen-
cies should treat information as a national asset and empower the public with the information needed
to hold the government accountable; 2) Participation: Agencies should inform and improve govern-
ment decision-making by tapping into the citizenry’s collective expertise through proactive engagement;
and 3) Collaboration: Agencies should cooperate among themselves and with nonprofits, businesses,
academia, and the public to better accomplish the work of the government.
The Administration operationalized these principles with the release of its ambitious Open Gov-
ernment Directive (OGD) [17]. The OGD sought to deliver greater openness, transparency, and ac-
countability – but more significantly to provide a mechanism through which to promote institutional
transformation of which Big and Open Data are a key component. The OGD was incorporated into
the international scale Open Government Partnership (OGP), which seeks to encourage accountabil-
ity, transparency, and transformation of governments through, among other factors, Big Data initiatives
(http://www.opengovpartnership.org/).
McDermott [17] and Evans and Campos [11] both provide an overview of the OGD, with the lat-
ter noting that the lack of specific guidance has hindered federal agencies’ implementation efforts. The
OGD also did not provide any framework for assessing open government initiatives, a shortcoming that
a coalition led by OpentheGovernment.org sought to rectify through the development of a measurement
framework and an assessment tool [5]. The implementation of data.gov, in particular, has been the sub-
ject of discussion. The OGD tasked government agencies with publishing online in an open format at
least three “high-value” datasets and registering those datasets via data.gov. Initially, the data.gov ini-
tiative was simply a repository for datasets and saw low uptake and use of released data. A 2011 study
concluded that, while federal agencies generally followed the letter of the OGD, data.gov had become
“the playground for a tiny group of agencies” [24]. To promote and foster data use, data.gov sought
to create and grow data communities around key topical areas where there was an interest and need.
Examples include:
– Education (http://www.data.gov/communities/education), a community built around national edu-
cation datasets from various agencies. Using visualizations, classroom instructional modules, and
datasets, the community is designed to assess the state of education on all levels.
– Health (http://www.data.gov/communities/health), a community that is a one-stop resource for the
growing ecosystem of innovators who are turning data into new applications, services, and insights
that can help improve health.
8 J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations

There are similar communities in the areas of business, cities, energy, law, manufacturing, oceans, and
safety, among others (see http://www.data.gov/communities/ for more details).
Building upon the notion that Big Data can be used by public and private entities to address social and
scientific problems, the Obama Administration announced its “Big Data Research and Development Ini-
tiative” in March 2012 (http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_
release_final_2.pdf). In connection with this initiative, the National Science Foundation (NSF), the Na-
tional Institutes of Health (NIH), the Department of Defense, the Department of Energy, and the U.S.
Geological Survey are seeding investments in Big Data initiatives. By way of example, NIH announced
that the data produced by the international 1000 Genomes Project (200 terabytes) was made freely avail-
able on the Amazon Web Services (AWS) cloud, with researchers only required to pay for the computing
services that they use. As noted by Lane [16] and Braveman [6], funding Big Data initiatives offers the
ability to seek relationship linkages in scientific grand challenges by bringing together often-disparate
disciplines that might not otherwise collaborate.
The Obama Administration, viewing data.gov and other Big and Open Data initiatives as a vehicle
for economic growth, job creation, innovation, and efficiency, remains focused on refining and opera-
tionalizing the open data infrastructure first proposed in 2009. To increase the accessibility of publicly
available datasets, President Obama signed Executive Order 13642, “Making Open and Machine Read-
able the New Default for Government Information”, in May 2013. Pursuant to this Order, “[g]overnment
information shall be managed as an asset throughout its life cycle to promote interoperability and open-
ness, and, wherever possible and legally permissible, to ensure that data are released to the public in
ways that make the data easy to find, accessible, and usable” [20]. The extent to which the current policy
framework can integrate this “new default” is the subject of Section IV.

3. Methodology

The authors engaged in exploratory research that included: 1) Policy analysis of existing U.S. in-
formation policies, laws, and initiatives that govern the availability and use of Big and Open Data by
government agencies; 2) A literature review of Big and Open Data initiatives in the U.S.; and 3) In-
terviews with civil society groups and data activists in the in the U.S. The goals of the policy analysis
were to ascertain the current policies, laws, and relevant government initiatives that have an impact on
government-generated Big Data and open government; assess the policy structure to identify issues asso-
ciated with government’s Big and Open Data policies; and assess the intersection of Big and Open Data
policies. The literature review provided an overview of Big and Open Data initiatives in U.S. government
agencies.
Finally, the interviews, though selective and limited, set out to explore how civil society groups were
using government-provided Big and Open Data. Further, the interviews offered the study team the op-
portunity to explore potential policy gaps (e.g., privacy, accessibility, preservation) as identified by in-
terviewees. In all, interviews were conducted with five different open government representatives in
communities that focus on environmental, health, budget, and overall access to government information
and transparency. The interviews, which occurred between December 2012 and the end of January 2013,
were informed by the initial policy analysis and literature review findings by the authors.
The guiding research questions were: 1) How are U.S. federal government agencies defining Big and
Open Data? 2) What approaches to Big and Open Data are government agencies taking, particularly in
terms of data release, availability, use, and collaborations? 3) To what extent does the U.S. information
policy environment support, contend with, promote, or hinder Big and Open Data initiatives? 4) What
J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations 9

Table 1
Selected information policies by objective
Policy objectives related to big data Selected relevant policy instruments
Governing and governance E-government Act of 2002
OMB Circular A-130 (Management of Federal Information Resources)
Paperwork Reduction Act
Various Copyright (Title 17 USC) and Patent and Trademark (Title 35 USC) legisla-
tion
Access and dissemination Americans with Disabilities Act
Executive Order 13166 – Improving Access to Services for Persons with Limited En-
glish Proficiency
Individuals with Disabilities Education Act
Section 504 of the Rehabilitation Act
Section 508 of the Rehabilitation Act
Telecommunications Act of 1996
Depository Library Act of 1962
Federal Depository Library Program (Title 44 USC)
Government Printing Office Electronic Information Access Enhancement Act of 1993
Privacy, security and accuracy Children’s Online Privacy Protection Act (COPPA)
Federal Information Security Management Act (FISMA)
Information Quality Act
OMB Memo M-03-22 (Guidance for Implementing the Provisions of the E-
government Act of 2002)
OMB Memo M-04-04 (E-Authentication Guidance for Federal Agencies)
OMB Memo M-05-04 (Policies for Federal Agency Websites)

policy initiatives, changes, and/or guidance are necessary to overcome the challenges and attain the
promise of Big and Open Data? This is a growing space for research, practice, and policy development
and, as an exploratory study, this effort serves as a limited and initial review that will require additional
study.
The exploratory nature and U.S. focus of the study limit its generalizability. A primary goal of the
study was to assess comprehensively the U.S. policy context governing Big and Open Data, in large part
due to the lack of attention of policymakers to determine the extent to which current data initiatives –
regardless of their potential value and/or innovation – conform to existing laws that govern the collection,
access, dissemination, and preservation of government data. As discussed below, the policy evaluation
uncovered a number of issues, gaps, and needs in terms of how the U.S. government manages its data
initiatives. The interviews allowed for selected deeper exploration of the results of the policy analysis.

4. Big and open data and the current information policy framework

A key issue regarding open government in general, and Big Data in particular, are the information
and data policies that govern the management, use, reuse, and accessibility of government information
and data (see Table 1). The U.S. has a complex and evolving set of information policies (laws, regula-
tions, and memoranda) that govern the information lifecycle – from the creation of information through
its dissemination to its disposition and archiving. Though not static, this policy framework lags behind
technological advancement. This disjuncture raises the question of whether the existing policy frame-
work in the U.S. adequately addresses the issues raised by Big and Open Data. The remainder of this
paper highlights potential gaps in the current information policy framework and offer recommendations
for addressing these gaps.
10 J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations

4.1. Governing and governance

The rise of e-government in the early part of this century created a need for policies that address the
role of agencies in creating, managing, disseminating and preserving digital government information.
Much of the guidance encapsulated in information management documents issued by the U.S. Office of
Management and Budget (OMB) establish principles that:
– Agencies are required to disseminate information to the public in a timely, equitable, efficient, and
appropriate manner.
– Agencies are required to establish and maintain Information Dissemination Product Inventories.
– Agencies must consider disparities of access and how those without Internet access will have access
to important disseminations.
– Agencies should develop alternative strategies to distribute information.
– When using electronic media, the regulations that govern proper management and archiving of
records still apply.
– Agencies need to evaluate and determine the most appropriate methods to capture and retain records
on both government servers and technologies hosted on non-Federal hosts.
These policies provide broad principles and guidance for agencies, but fail to address the use of Big
and Open Data, as nearly all pre-date the development and use of Big Data technologies. When con-
sidering these principles in light of Big Data technologies, a range of issues surface, such as the need
for alternative dissemination strategies for access to and dissemination of government information and
services and the need to re-evaluate records management, archiving, and preservation.
The E-government Act of 2002 also established several related principles of e-government that inform
the creation and use of Big and Open Data by government agencies, including requirements to develop
priorities and schedules for making government information available and accessible to the public, to
post inventories on agency websites, to comply with requirements of Section 508 of the Rehabilitation
Act in all online activities, and to implement and maintain an Information Dissemination Management
System.

4.2. Access and dissemination

For over 150 years, the Government Printing Office (GPO) has served as the lead and coordinating
agency in conjunction with the Federal Depository Library Program (FDLP) – a network of nearly 150
full, partial, and regional Depositories. This collaborative network has served as the primary means for
providing community access to government information. The Government Printing Office Electronic
Information Access Enhancement Act of 1993 updated the statutes governing the depository library
program to pave the way for access to and dissemination of digital government information, initially
through GPOAccess, and now through FDSYS (http://www.gpo.gov/fdsys/).
To increase access to government information and services and to successfully facilitate engagement
and collaboration, members of the public must be able to access and use Big and Open Data technologies.
Several policy instruments are directly related to access and dissemination, including:
– Executive Order 13166 (Improving Access to Services for Persons with Limited English Profi-
ciency) requires that agencies provide appropriate access to persons with limited English profi-
ciency, encompassing all “federally conducted programs and activities.” This policy objective is
meant to address gaps in e-government usage among people who predominantly speak a language
other than English.
J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations 11

– The Individuals with Disabilities Education Act requires equal access to all electronic materials
used in public education. The Americans with Disabilities Act provides broad prohibitions on the
exclusion of persons with disabilities from government services and benefits, including communi-
cation with the government. Section 504 of the Rehabilitation Act creates broad standards of equal
access to government activities and information for individuals with disabilities, and establishes
general rights to accessible information and communication technologies.
– Section 508 requires that electronic and information technologies purchased, maintained, or used
by the federal government meet certain accessibility standards designed to make online information
and services fully available to people with disabilities.
The extent to which the requirements set forth in these policy instruments applies in the context of
access to and dissemination of Big and Open Data remains an open question.

4.3. Privacy, security and accuracy

As government websites became two-way communities – opening the possibility of virus and other
attack agents being inserted into the government environment, as well as the possibility of unintended
release of information – the policy framework evolved to reflect this development. OMB Memo M-05-04
(Policies for Federal Agency Websites), for example, requires that agencies provide adequate security
controls to ensure information is resistant to tampering, to preserve accuracy, to maintain confidentiality
as necessary, and to ensure that the information or service is available as intended by the Agency and as
expected by users.
The presence of related policies, however, does not guarantee the existence of solutions to the myriad
issues raised by Big and Open Data initiatives. Concerns regarding personally identifiable information,
the security of government data and information, and the accuracy of publicly available data have all
been raised in connection with these initiatives. The quality, reliability, and authority of Big and Open
Data are key issues for governments, the research and scientific, and non-governmental and private
sectors. Data of poor quality, that are not certified and/or verified, or are collected using faulty methods
can lead to incorrect findings – which can undermine significantly a range of decision and policy making
processes. Existing data policies governing data.gov that seek to address these issues include:
– Placing the burden on the government agencies collecting and releasing the data to ensure data
accuracy, timeliness, and overall quality (as per the Information Quality Act (2001)).
– Requiring agencies to maintain version control so as to ensure clear labeling of the datasets.
– Requiring agencies to ensure that no data with national security implications are released through
data.gov.
– Requiring agencies to ensure that confidentiality and privacy guidelines are adhered to regarding
released data.

4.4. Usage, storage and preservation

There is a distinction between making Big and Open Data available and accessible, and fostering its
use. Moreover, there is a distinction between selective community data use (that is, only scientists within
a particular domain) and broader interdisciplinary use that cuts across domains and more typical research
communities. When coupled with emerging technologies such as social media, it is possible to create
broad-based communities that foster collaboration and engagement, co-production, and crowdsourcing
solutions and innovations [4,7,10,18,22,25].
12 J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations

A key aspect of the initial U.S. Open Government National Action Plan [20] is to essentially open
source the data.gov platform and make it available for replication in countries around the world. As a
public-facing platform, this can serve as a tool that fosters collaboration, stores datasets, engages com-
munities, and offers opportunities for participation. In addition, data can be stored and made available
in multiple formats (e.g., CSV, XML, Excel) via these platforms. Each format has implications, and
can limit and/or promote the use of the data. But if the goal is for broad public access and use, then
commonly used data formats are essential.
Simultaneously, however, there is a need for data repositories for large-scale scientific and research
datasets. One factor in building data communities is the growing need to integrate and manage data
from multiple sources and sectors. Already under development are a range of sensor-based technologies
such as smart vehicles, buildings, and homes – plus the increasingly ubiquitous smartphones. These
technologies enable a constant flow of geo-located data regarding traffic, energy consumption, water
use, and more [26]. But significantly, these flows of data need to intersect across governments, private
sector corporations, utility companies, devices (e.g., cars, smartphones, home sensors, building sensors)
and individuals for them to be truly useful and inform the development of communities and nations.
Thus, there is a need to create, adopt, and adhere to formal data management standards and practices
across entities so as to ensure data compatibility, naming conventions, and organizational schemes. In
addition, there is a need for well-defined data documentation and codebooks so as to ensure informed
use of the datasets by researchers.
“Mashups,” in which users take data from one website and combine it data from another, are but one
example of how Big and Open Data complicates the information policy environment. OMB Memo M-
05-04 requires agencies’ public websites, to the extent practicable and necessary to achieve intended
purposes, to provide all data in an open, industry standard format that permits users to aggregate, disag-
gregate, or otherwise manipulate and analyze the data to meet their needs. “Mashups” have the potential
to inform researchers, governments, policymakers, and the public, but often there is no formal process
through which the combined data are authorized and verified. As stated on the data.gov site, “Once
the data have been downloaded from the agency’s site, the government cannot vouch for their quality
and timeliness. Furthermore, the US Government cannot vouch for any analyses conducted with data
retrieved from Data.gov” (http://www.data.gov/data-policy). While this disclaimer serves to limit the
liability of the data.gov initiative, the issue of secondary data use has yet to be adequately addressed.
The curation of Big and Open Data also remains a significant issue. Digital curation “involves
maintaining, preserving and adding value to digital research data throughout its lifecycle,” and “cu-
rated data in trusted digital repositories may be shared among the wider. . . research community”
(http://www.dcc.ac.uk/ digital-curation/what-digital-curation). Importantly, digital curation focuses on
managing digital resources throughout a lifecycle, such as conceptual issues regarding digital assets, the
creation of digital assets, access and use issues, and appraisal and selection practices. There is a need to
engage in active data management strategies for Big Data along the entire lifecycle, particularly as new
digital data assets continue to grow.
And, finally, digital “open spaces”, like data.gov communities, create a situation in which there de-
creasingly exists a permanent and final “document,” upon which nearly all records management and
archiving efforts are built [4]. By using third party applications and software that reside on non-
governmental information systems, or in a continual state of modification and adaptation, data own-
ership, records schedules, and archiving present a significant challenge.
J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations 13

5. Recommendations and conclusion

The issues raised by Big and Open Data, and the inability of the current policy framework to ade-
quately address them, evidence a need for the development of a Big and Open Data governance model.
The governance model should consider the following.

5.1. Privacy

Big and Open Data can contain a range of personally identifiable data at the individual, household,
vehicle, or other levels. Privacy laws and policies can contradict the opportunities in Big and Open Data,
but such data simultaneously can violate the privacy rights of individuals or communities.

5.2. Data Reuse

Data are often collected by government agencies or other entities (e.g., utility companies, telecommu-
nication carriers) in connection with the receipt of social or other services. And, individual government
agencies and/or corporations often have acceptable use and privacy policies that govern data collection
and use. However, as Big and Open Data increasingly combine datasets from across sectors, govern-
ments, and households to create new insights and inform decision- and policy-making, there is the ques-
tion of what policies govern access to and preservation of these newly formed datasets that are neither
fully public nor private in terms of their content. Individuals needed to be given clear guidelines (in-
cluding opt-in/opt-out abilities) as to which data use and reuse policies govern so that they may make
informed decisions regarding their data.

5.3. Data accuracy

As new datasets are created through combining disparate data from different agencies, researchers,
scientists, private sector companies (e.g., telecommunications companies, vehicle manufacturers, utility
companies), and citizen groups, there is a need to develop and ensure data quality standards. Data col-
lected for single purpose use may not be fully compatible with other datasets, and this can lead to errors
and a range of false findings. There is both a need to ensure data quality as well as to develop a verifica-
tion system that validates reported findings. The disclaimer on data.gov (http://www.data.gov/privacy-
policy) places this burden on 1) the agencies releasing the data, and 2) those downloading and using the
data. This is an inadequate response to data use that can have significant impact on social, policy, and
science programs.

5.4. Archiving and preservation

There are a number of issues regarding archives and preservation policies regarding Big and Open
Data, including the large-scale nature of digital datasets, the embedding of analysis and findings within
certain technologies and techniques, and the raw data files. One side of the coin is records management
and archival policies, requirements, and practices of government agencies, collaborative partnerships,
and archival agencies. The other side of the coin is the long-term preservation and moving of datasets
forward over time as data and information technologies change. Also, there is a need to consider the
archiving and long-term preservation of research datasets created at non-governmental institutions such
as universities and research centers funded by government research agencies. In addition, Big and Open
14 J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations

Data can often be embedded in specialized technologies, models, or proprietary systems (e.g., forecast-
ing models, specialty software). The raw data themselves may not lend themselves to similar findings
without the embedded technologies and analysis platforms. The issue becomes one of deciding to pre-
serve the data and/or technologies that used the data to generate research findings. These strategies
should include overall dataset management so as to ensure the availability of smaller datasets that can
become part of Big and Open Data efforts.

5.5. Data curation

One of the major goals of Big and Open Data initiatives is to engage communities to combine multiple
large-scale datasets to create new knowledge. Each of these permutations of the data is a new dataset that
requires documentation, management, and curation. Moreover, Big Data are not necessarily born as Big
Data but rather emerge through the accumulation, modification, incorporation, and manipulation of many
smaller datasets. It is important that, in addition to the curation of these datasets, smaller communities
develop the capacity to engage in curation efforts so as to ensure maximum benefit from the data as
aggregation occurs.

5.6. Support of libraries

Big and Open Data presents many new potential responsibilities for libraries as patrons seek help ac-
cessing and using these data resources and assistance in creating and curating their own Big Data sets,
particularly as governments use Big and Open Data initiatives and rely on libraries to ensure access to
and the ability to use e-government resources [1]. Librarians are relatively well-positioned in terms of
existing skills to handle the implications of Big and Open Data, as most of the long-term professional
skills that have been cultivated by librarians – aggregating, cataloging, preserving, managing and cu-
rating information, as well as teaching digital literacy – are relevant to the challenges of Big and Open
Data. However, if libraries absorb the responsibilities of Big and Open Data management without also
receiving additional support, these responsibilities will become yet another stress on already strained
library budgets and workloads. It would also be another example of governments offloading their re-
sponsibilities for information access onto libraries in the form of an unfunded mandate. Ultimately, the
impact on libraries will be heavily affected by the policy decisions made regarding Big and Open Data,
which is still very much a nascent area of discourse.

5.7. Development of sustainable data platforms and architecture

There is a need for a robust technology infrastructure for organizing, curating, storing, and making
datasets accessible to the research and scientific communities, the private and other sectors, and the
public. These platforms need to provide both physical (technology) and intellectual (organizational)
access to Big and Open Data, and need to integrate seamlessly with a range of technologies, analysis
techniques, and information architectures. The infrastructure must be able to support public facing and
generic platforms such as data.gov, as well as specialized platforms for very large scale datasets in
particular sectors (e.g., health, environment).
J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations 15

5.8. Development of data standards

Big and Open Data requires interoperability at the technology level, but also at the data level through
the adherence to metadata standards. Different domains may have varying metadata standards such as
ISO19115 (the international standard for geospatial metadata); the Dublin Core metadata element set for
non-geospacial data resources (http://dublincore.org/documents/usageguide/elements.shtml); the emerg-
ing Data Documentation Initiative (DDI) for social and behavioral science (http://www.ddialliance.org/);
and others such as Z39.87 (MIX – Metadata for Images in XML; http://www.loc.gov/standards/mix/)
for digital images and the Darwin Core (http://rs.tdwg.org/dwc/index.htm) for biodiversity data. Those
creating, generating, and disseminating Big and Open Data datasets need to consider appropriate data
standards formats to ensure collaborations and data reuse. In addition, there is a need for documentation
standards for public release files that describe the organization of the dataset, data elements, data type
(e.g., numeric, text), and other descriptive information regarding dataset contents. Also, limitations of
the data should be acknowledged and made apparent.

5.9. Encouragement of data sharing policies across sectors

As Big and Open Data increasingly involves the passing of data in real time between systems, gov-
ernments, and sectors, there is a need for a robust data sharing and interoperability framework. Big and
Open Data initiatives that utilize collaborative analysis techniques require a seamless integration of data
collection and reporting system. As noted above, however, it may be necessary to revise information and
data policies to reflect this integrative data context.
Though not comprehensive, the above recommendations offer a beginning point for a revised policy
framework to address significant issues raised by the U.S. government’s engagement in Big and Open
Data initiatives. Simply put, there are a number of policies that at a minimum require updating, but in
several cases require a complete overhaul to account for data initiatives and their implications. On the
one hand, there is a need for the U.S. government to explore and leverage its investments in data to spur
innovation and transformation of government. On the other hand, there is a need to consider existing
laws and policies that govern government data – and the extent to which current and future initiatives are
in violation of those policies. As we enter an era of unparalleled data-driven opportunity, it is important
to consider engaging in data practices that adhere to principles of privacy, access, and dissemination that
serve as a foundation of U.S. democracy while simultaneously engaging in effective data management
processes. Building upon these recommendations, future research should develop and explore a Big and
Open Data governance model.

References

[1] J.C. Bertot, U. Gorham, P.T. Jaeger and L.C. Sarin, Big data, libraries, and the information policies of the Obama
administration, in: Library and Book Trade Almanac, D. Bogart, ed., Chicago: ALA Editions, in press.
[2] J.C. Bertot, P.T. Jaeger and J.M. Grimes, Using ICTs to create a culture of transparency? E-government and social media
as openness and anti-corruption tools for societies, Government Information Quarterly 27(3) (2010), 264–271.
[3] J.C. Bertot, P.T. Jaeger and J.M. Grimes, Promoting transparency and accountablity through ICTs, social media, and
collaborative e-government, Transforming Government: People, Process and Policy 6(1) (2012), 78–91.
[4] J.C. Bertot, P.T. Jaeger, S. Munson and T. Glaisyer, Engaging the public in open government: The policy and government
application of social media technology for government transparency, IEEE Computer 43(11) (2010), 53–59.
[5] J.C. Bertot, P. McDermott and T. Smith, Measurement of open government: Metrics and process, in: Proceedings of the
45th Hawaii International Conference on System Sciences (HICSS), 2012, pp. 2491–2499.
16 J.C. Bertot et al. / Big data, open government and e-government: Issues, policies and recommendations

[6] N.S. Braveman, Guiding investments in research. Using data to develop science funding programs and policies, Research
Trends: Special Issue on Big Data 30 (2012), 9–10.
[7] A. Chang and P.K. Kannan, Leveraging Web 2.0 in government. Washington DC: IBM Center for The Business of
Government, 2008.
[8] M. Cox and D. Ellsworth, Application-controlled demand paging for out-of-core visualization, in: Proceedings of the
8th conference on Visualization ’97 (VIS ’97), Roni Yagel and Hans Hagen, eds, Los Alamitos, CA: IEEE Computer
Society Press: 235-ff, 1997.
[9] D. Cullier and S.J. Piotrowski, Internet information-seeking and its relation to support for access to government records,
Government Information Quarterly 26(3) (2009), 441–449.
[10] M. Drapeau and L. Wells, Social software and national security: An initial net assessment. Center for Technology and
National Security Policy, National Defense University, 2009. Available at: http://www.ndu.edu/ctnsp/Def_Tech/DTP61.
[11] A.M. Evans and A. Campos, Open government initiatives: Challenges of citizen participation, Journal of Policy Analysis
and Management 32(1) (2013), 172–185.
[12] P.T. Jaeger and J.C. Bertot, Transparency and technological change: Ensuring equal and sustained public access to gov-
ernment information, Government Information Quarterly 27(4) (2010), 371–376.
[13] P.T. Jaeger, J.C. Bertot and K. Shilton, Information policy and social media: Framing government-citizen Web 2.0 in-
teractions, in: Web 2.0 Technologies and Democratic Governance: Political, Policy and Management Implications, C.G.
Reddick and S.K. Aikins, eds, London: Springer, 2012, pp. 11–25.
[14] P.T. Jaeger, J.C. Bertot and J.A. Shuler, The Federal Depository Library Program (FDLP), Academic Libraries, and
Access to Government Information, The Journal of Academic Librarianship 36(6) (2010), 469–478. http://dx.doi.org/
10.1016/j.acalib.2010.08.002.
[15] P.T. Jaeger, S. Paquette and S.N. Simmons, Information policy in national political campaigns: A comparison of the
2008 campaigns for President of the United States and Prime Minister of Canada, Journal of Information Technology &
Politics 7(1) (2010), 1–16.
[16] J. Lane, Science metrics and the black box of science policy, Research Trends: Special Issue on Big Data 30 (2012), 7–8.
[17] P. McDermott, Building open government, Government Information Quarterly 27(4) (2010), 401–413.
[18] B.E. Noveck, Wiki-government, Democracy: A Journal of Ideas (Dec. 2007), 2008.
[19] B.H. Obama, Transparency and open government. Memorandum for the Heads of Executive Departments and Agencies.
Washington, DC: Office of the Executive. Available at: http://www.whitehouse.gov/the-press-office/transparency-and-
open-government, (2009, January 21).
[20] B.H. Obama, Executive Order 13642: Making Open and Machine Readable the New Default for Government Informa-
tion. Washington, DC: Office of the Executive, (2013, May 9). Available at: http://www.gpo.gov/fdsys/pkg/FR-2013-05-
14/pdf/2013-11533.pdf.
[21] Office of Science and Technology Policy. (2011). The Open Government Partnership: National action plan for the United
States of America. Washington, DC: Office of Science and Technology Policy. Available at: http://wzw.whitehouse.gov/
sites/default/files/us_national_action_plan_final_2.pdf.
[22] D. Osimo, Web 2.0 in government: Why and how? Washington DC: Institute for Prospective Technological Studies,
2008.
[23] S. Paquette, P.T. Jaeger and S.C. Wilson, Identifying the risks associated with governmental use of cloud computing,
Government Information Quarterly 27(3) (2010), 245–253.
[24] A. Peled, When transparency and collaboration collide: The USA open data program, Journal of the American Society
for Information Science and Technology 62(11) (2011), 2085–2094.
[25] C. Snyder, (2009, March 25). Government agencies make friends with new media. Wired. Available at: http://blog.wired.
com/business/2009/03/government-agen.html.
[26] The Economist. (2012, Oct. 27). Special report on technology and geography: A sense of place, The Economist
405(8808), 1–22.