A Big Data Urban Growth Big Datasimulation at A National Scale

A big data urban growth simulation at a national scale: Conguring
the GIS and neural network based Land Transformation Model to run
in a High Performance Computing (HPC) environment
Bryan C. Pijanowski
a,
*
, Amin Tayyebi
a, b
, Jarrod Doucette
a
, Burak K. Pekin
a, c
,
David Braun
d, e
, James Plourde
a, f
a
Department of Forestry and Natural Resources, Purdue University, 195 Marsteller Street, West Lafayette, IN 47907, USA
b
Department of Entomology, University of Wisconsin, Madison, WI 53706, USA
c
Institute for Conservation Research, San Diego Zoo Global, 15600 San Pasqual Valley Road, Escondido, CA 92027, USA
d
Rosen Center for Advanced Computing, Information Technology Division, Purdue University, West Lafayette, IN 47907, USA
e
Thavron Solutions, Kokomo, IN 46906, USA
f
Worldwide Construction and Foresy Division, John Deere, 1515 5th Avenue, Moine, IL, 61265, USA
a r t i c l e i n f o
Article history:
Received 5 April 2013
Received in revised form
18 September 2013
Accepted 23 September 2013
Available online 7 November 2013
Keywords:
Land use land cover change
Big data simulation
Land Transformation Model
High Performance Computing
Extensible Markup Language
Python environment
Visual Studio 10 (C#)
Continental scale
a b s t r a c t
The Land Transformation Model (LTM) is a Land Use Land Cover Change (LUCC) model which was
originally developed to simulate local scale LUCC patterns. The model uses a commercial windows-based
GIS program to process and manage spatial data and an articial neural network (ANN) programwithin a
series of batch routines to learn about spatial patterns in data. In this paper, we provide an overview of a
redesigned LTM capable of running at continental scales and at a ne (30m) resolution using a new
architecture that employs a windows-based High Performance Computing (HPC) cluster. This paper
provides an overview of the new architecture which we discuss within the context of modeling LUCC
that requires: (1) using an HPC to run a modied version of our LTM; (2) managing large datasets in
terms of size and quantity of les; (3) integration of tools that are executed using different scripting
languages; and (4) a large number of steps necessitating several aspects of job management.
2013 Elsevier Ltd. All rights reserved.
1. Introduction
The Land Transformation Model was developed over fteen
years ago (Pijanowski et al., 1997, 2000, 2002a,b) to simulate spatial
patterns of land use land cover change (LUCC) over time. The model
uses geographic information systems (GIS) to process and manage
spatial data layers and articial neural network (ANN) tools to learn
about patterns in input (i.e., drivers) and output (e.g., historical land
use change) data. The model has been used to forecast LUCC pat-
terns in a variety of places around the world, such as the Midwest
USA (Pijanowski et al., 2005), central Europe (Pijanowski et al.,
2006), East Africa (Olson et al., 2008; Washington-Ottombre
et al., 2010; Pijanowski et al., 2011) and Asia (Pijanowski et al.,
2009). Forecasts are often linked to climate (Moore et al., 2010,
2011), hydrologic (Tang et al., 2005a,b; Yang et al., 2010) or bio-
logical (Wiley et al., 2010) models to examine how whateif LUCC
scenarios impact the environment (e.g. Ray et al., 2011) and/or
economics (Skole et al., 2002). The LTM has even been engineered
to run backwards (Ray and Pijanowski, 2010) in order to examine
environmental impacts of historical LUCC or the effects of land use
legacies on slow environmental processes, such as groundwater
transport through watersheds (Wayland et al., 2002; Pijanowski
et al., 2007; Ray et al., 2012). The LTM has been recently extended
to simulate and predict urban boundary change (Pijanowski et al.,
2009; Tayyebi, and Perry, 2013) which can be used by urban
planners and managers interested in the control of urban growth.
Modeling, especially in a spatially explicit way, allows for con-
ducting experiments that quantify the importance of various LUCC
drivers, contributing to a better understanding of key LUCC pro-
cesses (Veldkamp and Lambin, 2001; Burton et al., 2008; Pontius
* Corresponding author. Tel.: 1 765 496 2215.
E-mail address: bpijanow@purdue.edu (B.C. Pijanowski).
Contents lists available at ScienceDirect
Environmental Modelling & Software
j ournal homepage: www. el sevi er. com/ l ocat e/ envsof t
1364-8152/$ e see front matter 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.envsoft.2013.09.015
Environmental Modelling & Software 51 (2014) 250e268
and Petrova, 2010; Anselme et al., 2010; Prez-Vega et al., 2012).
Large-scale LUCC models are needed to understand regional con-
tinental to global scale problems like climate change (Chapman,
1998; Kilsby et al., 2007; Merritt et al., 2003), human impacts to
ecosystem services (MEA, 2005), alterations to carbon sequestra-
tion (Post and Kwon, 2000), and dynamics of biogeochemical
cycling (Boutt et al., 2001; Pijanowski et al., 2002b; Wayland et al.,
2002; Turner et al., 2003; GLP, 2005; Fitzpatrick et al., 2007;
Anselme et al., 2010; Carpani et al., 2012). One of the characteris-
tics of all LTM applications to date is that the size of the simulation
has been small enough to run on a single advanced workstation.
However, as models originally designed for local to regional sim-
ulations are needed at continental scales or larger, a redesign of
single workstation models, such as the LTM, becomes necessary.
Indeed, recent calls by the scientic and policy community for the
development and use of large-scale models (e.g., Earth Systems
Science community, e.g., Randerson et al., 2009; Xue et al., 2010;
Lagabrielle et al., 2010) underscores the importance of focusing
attention on large-scale environmental modeling.
Increasing the size of any computer simulation, like the LTM,
has several challenges (Herold et al., 2003, 2005; Lei et al., 2005;
Dietzel and Clarke, 2007; Clarke et al., 2007; Adeloye et al., 2012).
First is the need to manage large datasets that are used as inputs
and are output by the model (Yang, 2011; Loepfe et al., 2011). The
national-scale application of the LTM that we present here sim-
ulates LUCC for the lower 48 states of the USA at 30m resolution.
This represents a simulation domain of 1.54 10
5
by 9.75 10
4
cells (i.e., over 15 billion cells). Additionally, as many as 10 spatial
drivers are used per simulation and up to 10 forecast maps (a 50-
year projection with 5-year time steps) are created. Forecasts may
also involve multiple forecast scenarios with multiple time steps;
a recent LTM application (Ray et al., 2011) compared 36 different
LTM forecast scenarios for a single regional watershed for 2000
through 2070 at ve-year time steps. Thus, the number of cells
within each simulation can be very large and can easily exceed
one trillion. A second challenge presented by modeling large re-
gions is the need to create and manage a large number of les in a
variety of formats. For the LTM, this requires managing input,
programs and tools, and output les for GIS and neural network
software. For the national-scale LTM described below, we used the
GIS to split the simulation into over 20,000 census-designated
places (e.g., towns, villages and cities), which were then stored
in folders in a hierarchical structure. Thus, standards for le
naming and use in automated routines are necessary to properly
manage numerous les. Third, since we are using a variety of
tools, such as ESRIs ArcGIS Desktop and Stuttgart Neural Network
Simulator (SNNS), each with their own scripting language, a
higher-level architecture that automates the control of multiple
programs is needed with such models. Fourth, since many exe-
cutions occur during the simulation, knowing when failure occurs
is necessary and thus the status and progress of the simulation
needs to be tracked. Finally, given that the LTM contains
numerous programs and scripts, a way to manage the processing
of jobs becomes necessary. All of these challenges are inherent in
what some scientic communities call the big data problem
(Lynch, 2008; Hey, 2009; Jacobs, 2009; LaValle et al., 2011). These
challenges require solutions that are different from simulations
that are executed on single workstations.
In this paper, we describe how we have congured a single
workstation version of the LTM to run in a Windows-based High
Performance Computing (HPC) environment for a version of the
Land Transformation Model we call the LTM-HPC. We summarize
the important architectural features of this version of the model
providing both ow diagrams of the processing steps, maps of
data layers used in the simulation, as well as pseudo-code that
illustrates how les and routines are handled. This paper will
assist others who are interested in (1) using articial neural
networks to learn about patterns in spatial data where data are
large and/or (2) using HPC tools to recongure an environmental
model composed of a series of programs not linked to a graphical
user interface.
We organize the remainder of this paper as follows. Section 2
provides an overview of the original LTM, introducing basic
modeling terms, summarizing important features of high perfor-
mance compute (HPC) environment, and the architecture of the
current LTM as it is congured for an HPC. Section 3 describes a
specic application of the LTM-HPC run at a national scale for urban
change for the conterminous USA. The paper concludes by discus-
sing the potential of the LTM-HPC for simulating ne resolution
urban growth patterns at large regional scales as well as the use-
fulness of such projections.
2. Brief background
2.1. Overview of the Land Transformation Model (LTM) and
Articial Neural Networks
The LTM (Pijanowski et al., 2000, 2002a, 2009) simulates land
use/cover change based on socio-economic and bio-physical factors
using an Articial Neural Network (ANN) and a raster GIS modeling
environment. Its previous, as well as the current architecture,
summarized here, is based on scripts and a sequential series of
executable programs that provide considerable exibility in
running the model. There are no graphical user interfaces for the
model. At the highest level of organization (Fig. 1), the LTMcontains
six major components: (1) a data preparation set of routines and
procedures many of which are conducted in a GIS; (2) a series of
steps, called pattern recognition, that allow an articial neural
network to learn about patterns in input (drivers of land use
change) and output (historical change or no change in land use)
data which are then applied to an independent set of data and
output values are estimated; and (3) a sequence of C# and GIS
based programs for model calibration; (4) an independent
assessment of model performance or model validation also writ-
ten in C# and GIS; (5) routines used those for creating future
scenarios of land use, and (6) model products and applications
conducted within a GIS.
In the LTM, we use a multi-layer perceptron (MLP) ANN within
the Stuttgart Neural Network Simulator (SNNS) software to
approximate an unknown relation between input (e.g., drivers of
change) and output (e.g., locations of change and no change).
Typical inputs include distance to roads, slope and distance to
Fig. 1. Main components of the Land Transformation Model.
B.C. Pijanowski et al. / Environmental Modelling & Software 51 (2014) 250e268 251
previous urban (Fig. 2). Outputs are binary values of change (1) and
no change (0) in observed land use maps. Input values are fed
through a hidden layer with the number of nodes equal to that of
inputs (see Pijanowski et al., 2002a; Mas et al., 2004). The ANNuses
learning rules to determine the weights, values for bias and acti-
vation function to t input and output values (Fig. 2) of a dataset.
Delta rules are used to adjust all of these values across successive
passes of the data; each pass is called a cycle. A mean square error is
calculated for each cycle from a back propagation algorithm
(Bishop, 1995; Dlamini, 2008), values for weights, bias and activa-
tion function are then adjusted, and the training stopped after a
global minimum MSE is reached. The process of cycling through is
called training. In the LTM, we use a small randomly selected
subsample (between 5 and 10%) of the data to train. Applying the
weights, values for bias and the activation functions froma training
run to another dataset that contain inputs only, in order to estimate
output, is referring to as testing. We conduct a double phase testing
with the LTM (Tayyebi et al., 2012) at large scales (e.g., contermi-
nous of USA). The rst phase of testing is to use the weights, bias
and activation values saved from the training of the subsample and
apply the values to the entire dataset. A set of goodness of t sta-
tistics are generated between the predicted and observed maps; we
also refer to this testing phase as model calibration. Model cali-
bration also involves a hold one out procedure where each input
data layer is held back from the same testing dataset and goodness
of t of the reduced input models is compared against the full
complement model (see Pijanowski et al., 2002a). Thus, for the LTM
simulation below, we used ve input maps to predict one map of
urban change. We held one out at a time, to produce ve input map
models with the same urban change map. These reduced input
models are compared with the full complement model of six input
maps.
We follow the recommendation of Pontius et al. (2004) and
Pontius and Spencer (2005) of validating the model e our second
phase of testing e which is done with a different dataset than what
is used for the rst phase of testing. This independent dataset can
be another land use map that was derived from a different source
(i.e., test of generalization) or another year (i.e., test of predictive
ability). It is typical practice (Bishop, 1995) to use different data for
training and testing, which is done here.
Forecasting is accomplished using a quantity model developed
using per capita land use growth rates and a population growth
estimate model (cf. Pijanowski, et al., 2002a; Tayyebi et al., 2012).
The quantity model can be applied across the entire simulation
domain with one quantity estimate per time step or the quantity
model can be applied to smaller spatial units across the simulation
domain, as done here.
2.2. High Performance Computing (HPC)
High Performance Computing (HPC) integrates computer ar-
chitecture design principles, operating system software, heteroge-
neous hardware components, programs, algorithms and
specialized computational approaches to address the handling of
tasks not possible or practical with a single computer workstation
(Foster and Kesselman, 1997; Foster et al., 2002). A self-contained
HPC (i.e., a group of computers) is often referred to as a high per-
formance compute cluster (HPCC) (cf. Cheung and Reeves, 1992;
Buyya, 1999; Reinefeld and Lindenstruth, 2001). A main feature of
HPCs is the integration of hardware and software systems that are
congured to parse large processing jobs into smaller parallel tasks.
Hardware resources can be managed at the level of cores (a single
processing unit capable of performing work), sockets (a group of
cores that have direct access to memory) and nodes (individual
servers or computers that contain one or more sockets). The HPCC
employed here is specically congured to control the execution of
several batch les, executable programs and scripts for thousands
of input and output data les. An HPCC is managed by an admin-
istrator with hardware and software services accessible to many
users. HPCCs are systems smaller than supercomputers although
the term HPC and supercomputer are often used interchangeably.
We controlled all our LTM-HPC programs on the HPCC using
Windows HPC 2008 Server R2 job manager, which has features
common to all job managers. The server job manager contains: (1) a
job scheduler service that is responsible for queuing jobs and
tasks, allocating resources, dispatching the tasks to the compute
nodes, and monitoring the status of the job, tasks, and nodes; (2) a
job description le, congured as an Extensible Markup Language
(XML) le listing both job or task specications; (3) a job, which is a
resource request that is submitted to the job scheduler service that
Estimated error from observed
data (back propagation errors)
Assign weights, bias and activation
function to estimate output
A pass forward and back is
called a cycle or epoch
Slope
Distance to stream
Distance to urban
Distance to primary road
Distance to secondary road
Presence = 1 or absence = 0
of a land use transition
Output Nodes Hidden Nodes Input Nodes
Fig. 2. Structure of an articial neural network.
assigns hardware resources to all tasks; and (4) a task, which is a
command (i.e., a program or script) with path names for input and
output les and software resources assigned for each task. Many
jobs and all tasks are run in parallel across multiple processors. The
HPC job manager is the primary interface for submitting LTM-HPC
jobs to a cluster; it uses a graphical user interface. Jobs are also
submitted from a remote desktop using a client utility in Microsoft
Windows HPC Pack 2008.
Fig. 3 shows sample lines from an XML job description le used
below to create forecast maps by state. Note that the highest level
contains job parameters; parameters are passed to the HPC Server
for project name, user name, job type, types and level of hardware
resources for the job, etc. Tasks are listed after as dependencies to
the higher-level job; tasks here contain several parameters (e.g.,
how the hardware resources are used) and commands (e.g., name
of the Python script to execute and the parameters for that script,
such as the name of the input and output le names).
3. Architecture of the LTM and LTM-HPC
3.1. Main components
Several modications were made to the LTM to make the LTM-
HPC run at larger spatial scales (i.e., larger datasets) and with ne
resolution. Below, we describe the structure of the components
that comprise the current version of the LTM (hereafter as the
single workstation LTM) and the features that were necessary to
recongure it for an HPCC. There are several different kinds of
programming environments that comprise the single worksta-
tion LTM. The rst are command-line interpreter instructions
congured as batch les for use in the Windows operating sys-
tem; these are named using the *.BAT extension. Batch les
control most of the processing of data for Stuttgart Neural
Network Simulator (SNNS). A second type of programming
environment that comprises the LTM are compiled programs
written to accept environment variables as inputs. Programs are
written in C or C# programming language as a standalone *.EXE
le to be executed at the command line. The environment vari-
ables for these programs are often the location and name of input
and output les. Complied programs are used to transpose data
structures and to calculate very specic values during model
calibration. The third kind of program environment is the script
environment written to execute application-specic tools.
Application-specic scripts that we use here are ArcGIS Python
(version 2.6 or higher) scripts which call certain features and
commands of ArcGIS and Spatial Analyst. A fourth type of soft-
ware environment is the XML jobs le; these are used by the
Windows 2008 Server R2 job manager of the LTM-HPC to execute
and organize the batch routines, compiled programs and scripts
in the proper order and with the necessary environment vari-
ables. This fourth kind of software environment, the XML jobs
le, is only present in the LTM-HPC.
Fig. 4 shows the sequence of batch routines, programs and
scripts that comprise the LTM currently organized into the six main
model components: data preparation, pattern recognition, cali-
bration, validation, forecasting and application. Here, we provide an
overview of the key features of the LTM and LTM-HPC emphasizing
howthese features enable us to simulate land use cover change at a
national scale; those batch routines and programs that have been
modied for running in the HPC environment and congured using
XML job les are contained in the red boxes in Fig. 4.
3.2. Data preparation
Data preparation for training and testing runs in the LTM and
LTM-HPC is conducted using a GIS and spatial databases (Fig. 4,
Fig. 3. XML job le illustrating the syntax for job parameters, task parameters and task commands.
Fig. 4. Tool and data view of the LTM-HPC (see Legend for a description of model components and their meaning). (For interpretation of the references to color in this gure legend, the reader is referred to the web version of this
article.)
B
.
C
.
P
i
j
a
n
o
w
s
k
i
e
t
a
l
.
/
E
n
v
i
r
o
n
m
e
n
t
a
l
M
o
d
e
l
l
i
n
g
&
S
o
f
t
w
a
r
e
5
1
(
2
0
1
4
)
2
5
0
e
2
6
8
2
5
4
item #1); as this is done once for each simulation, this LTM
component is not automated. A C# program called createpat.exe
(Fig. 4, item #2) is used to convert spatial data to neural net les
called a pattern le (Fig. 4, item #3) given an *.PAT extension; data
are transposed into the ANN structure. Data necessary to process
les for the training run for neural net simulation are model inputs,
two land use maps separated by approximately 10 years or more,
and a map of locations that need to be excluded fromthe neural net
simulation. Vector shape les (e.g., roads) and raster les (e.g.,
digital elevation models, land use/cover maps) are loaded into
ArcGIS and ESRIs Spatial Analyst is used to calculate values, per
pixel in the simulation domain, that are used as inputs to the neural
net. A raster le is selected (e.g., base land use map) to set ESRI
Spatial Analyst Environment properties, such as cell size, number of
row and columns, for all data processing, to ensure that all inputs
have standard dimensions. A separate le, referred to as the
exclusionary zone map (Fig. 5, item #1), is created using a GIS.
Exclusionary maps contain locations where a land use transition
cannot occur in the future. For a model congured to simulate ur-
ban, for example, areas that are in protected areas (e.g., public
parks), open water, or are already urban, are coded with a 4 in the
exclusionary zone map. This exclusionary map is used in several
steps of the LTM: for excluding data that is converted for use in
pattern recognition, model calibration and model forecasting. The
coding of locations with a 4 becomes more obvious below under
the presentation of model calibration. Inputs (Fig. 5, item #2) are
created by applying spatial transition rules outlined in Pijanowski
et al. (2000). A frequent input map is distance to roads; for our
LTM-HPC application example below, Spatial Analysts Euclidean
Distance Tool is used to calculate the distance each pixel is fromthe
nearest road. All GIS data for use in the LTM are written out as an
ASCII at le (Fig. 5A).
Two land use maps are used to determine the locations of
observed change (Fig. 5, #3) and these are necessary for the
training runs. The program createpat.exe (Fig. 5B) stores a value of
0 if no change in a land use class occurred and a 1 if change was
observed (Fig. 5, #5). The testing run does not use land use maps
for input and the output values are estimated by the neural net in
the phase of the model. The program createpat uses the same
input and exclusionary maps to create a pattern le for testing
(Fig. 5, item #3).
A key component of the LTM is converting data from a GIS
compatible format to a neural network format called a pattern le
(Fig. 5C). Conversion of les from raster maps to data for use by the
neural network requires both transposing the database structure
and standardizing all values (Fig. 5, #6). The maximum value that
occurs in the input maps for training is also stored in the input le
and this is used to standardize all values from the input maps
because the neural network can only use values between 0.0 and
1.0 (Fig. 5C). Createpat.exe also uses the exclusionary map (Fig. 4,
#1) in ASCII format to exclude all locations that are not convertible
to the land use class being simulated (e.g., wildlife parks should not
convert to urban). For training runs, createpat.exe also selects
subsamples of the databases (by location); the percentage of the
data to be selected is specied in the input le. Finally, crea-
tepat.exe also checks the headers of all maps to ensure that they are
of the same dimensions.
3.3. Pattern recognition
SNNS has several choices for training; the program that per-
forms training and testing is called batchman.exe (Fig. 4, item #4).
As this process uses a subset of data and cannot be parallelized
easily, we conducted training on a single workstation. Batchma-
n.exe allows for several options which are employed in the LTM.
These include a shufe option which randomly orders the data
presented to the neural network during each pass (i.e., cycle) (cf.
Shellito and Pijanowski, 2003; Peralta et al., 2010), the values for
initial weights (cf. Denoeux and Lengell, 1993), the name of the
pattern les for input and output, the lename containing the
network values, and a set of start and stop conditions (e.g., a stop
condition can be set if a MSE or a certain number of cycles is
reached). We control the specic batchman.exe execution pa-
rameters using a DOS batch le called train.bat (Fig. 4, item #5).
Training is followed over the training cycles with MSE (Fig. 4, item
#6) and les (called ltm.net, Fig. 4, item#7) with weights, bias and
activation function values saved every N number of cycles. An MSE
equal to 0.0 is a condition that output of ANN matches the data
perfectly (Bishop, 2005); Pijanowski et al. (2005, 2011) has shown
that LTM stabilizes after less than 100,000 cycles in most cases.
Pseudo-code for the TRAIN.BAT is:
loadNet(ltm.net)
loadPattern(train.pat)
setInitFunc (Randomize_Weights, 1.0, 1.0)
setShufe (TRUE)
initNet()
trainNet()
while MSE > 0.0 and CYCLES <500,000 do
if CYCLES mod 100 0 then
print (CYCLES,, ,MSE)
endif
if CYCLES 100 then
saveNet (100.net)
endif
We used the SNNS batchman.exe program to create a suitability
map (i.e., a map of probability of each cell undergoing urban
change) used for forecasting and calibration; to do this, data have to
be converted from SNNS format to a GIS compatible format (Fig. 4,
item #11). The test.PAT les (Fig. 4, item #13) are converted to
probability maps by applying the saved ltm.net le (le with the
weights, bias and activation values; Fig. 4, item #8) produced from
the training run using batchman.exe (Fig. 4, item #8). Output from
this execution is called a RES (or result) le (Fig. 4, item#9). The RES
le contains estimates of output created by the neural network. The
RES le is then transposed back to an ASCII map (Fig. 4, item #10)
using a C# program. All values from the *.RES les are between 0.0
and 1.0; convert_ascii.exe also stores values in the ASCII suitability
(Fig. 4, item #11) maps as integer values between 0 and 100,000 by
multiplying the *.RES le values by 100,000 so that the raster le in
ArcGIS is not oating point (oating point les in ArcGIS require a
less efcient storage format and thus large oating point les
through ArcGIS 10.0 are unstable).
We also train on data models where we hold one input out at
a time (Fig. 4, item #12; Pijanowski et al., 2002a; Tayyebi et al.,
2010); for example, in one set, distance to roads is held out and
compared to having all inputs in the training. Thus, if we start with
a ve input variable neural network model, we hold one out at a
time and create calibration time step maps for each and save error
les over training cycles for each of the reduced input variable
models.
3.4. Calibration
For model calibration (see Bennett et al., 2013 for an extensive
review of the topic, our approach follows their recommendations),
we consider three different sets of metrics to judge the goodness of
t of the neural network model. The rst is mean square error
(MSE), which is plotted over training cycles, to ensure that the
Fig. 5. Data processing steps for converting data from a GIS format to a pattern le format for use in SNNS.
B
.
C
.
P
i
j
a
n
o
w
s
k
i
e
t
a
l
.
/
E
n
v
i
r
o
n
m
e
n
t
a
l
M
o
d
e
l
l
i
n
g
&
S
o
f
t
w
a
r
e
5
1
(
2
0
1
4
)
2
5
0
e
2
6
8
2
5
6
neural network settles at a global minimum value. MSE is calcu-
lated as the difference between the estimate produced by the
neural network (range 0.0e1.0) and the observed value of land use
change (0 or 1). MSE values are saved very 100 cycles and training is
generally followed out to about 100,000 cycles. The second set of
goodness of t metrics is those created from a calibration map. A
calibration map is also constructed within the GIS using three maps
coded specially for assessment of model goodness of t. A map of
observed change between the two historical maps (Fig. 4, item#16)
is created such that observed change 0 and no change is 1. A
map that predicts the same land use changes over the same amount
of time (Fig. 4, #15) is coded so that predicted change is 2, and no
predicted change is 0. These two maps are then summed along
with the exclusionary zone map that is coded 0 location can
change and 4 location that needs to be excluded. The resultant
calibration map (Fig. 4, #17) generates values 0 through 4 with
correct predictions of 0 correctly predicted no change and
3 correctly predicted change. Values of 2 and 3 represent
different errors (omission and commission or false positive and
false negative). The proportion of each type of error and correctly
predicted location are used to calculate: (1) the proportion of
correctly predicted change locations to the number of observed
change cells, also called the percent correct metric (proportion of
correctly predicted land use changes to the number of observed
land use changes) or PCM (Pijanowski et al., 2002a); (2) sensitivity
(the proportion of false positives) and specicity (the proportion of
false negatives) and (3) scaleable PCM values across different
window sizes.
Fig. 6 shows how scaleable PCM values are calculated using a
01234-coded calibration map across different window sizes. The
rst step is to calculate the total number of true positives (cells
coded as 3s) in the calibration map (Fig. 6A). For a given windowof
say, (e.g. 5 cells by 5 cells), a pair of false positives (cells code as 2s)
and false negative (cells coded as 1s) are considered together as a
correct prediction at that scale and window; the number of 3s is
incremented by one for every pair of false positive and false
negative cells. The windowis then moved one position to the right
(Fig. 6B) and pairs of 1s and 2s are again added to the total number
of 3s for that calibration map such that any 1s or 2s already
counted are not considered. This moving N N window is passed
across the entire simulation area and the nal number of 3s
recorded (Fig. 6C). The window size is then incremented by 2 (i.e.,
the next window size after a 5 5 would be a 7 7) and, after all
of the windows are considered in the map, the process is repeated
Fig. 6. Steps in the calculation of PCM across a moving scaleable window. Part 6A calculates the total number of true positives (coded as 3s). The window is then moved one position
to the right (Part 6B) and pairs of 1s and 2s are again added to the total number of 3s. This moving window is passed across the entire area and the nal number of 3s recorded (Part
6C). The window size is then incremented by 2 and the process is repeated. Part 6D gives PCM across scaleable window sizes.
(note that the number of 3s is reset to the number of 3s is the
entire calibration map) and the number of 3s saved for that
window size. Window sizes that we often plot are between 3 and
101; Fig. 6D gives an example PCM across scaleable window sizes.
Note in this plot that the PCM begins to exceed 50% around a
window size of 9 9 which for this simulation conducted at
100x 100m means that PCM reaches 50% at 900m 900m. The
scaleable window plots are also made for each reduced input
model as well in order to determine the behavior of the training of
the neural network against the goodness of t of the calibration
maps by input.
The nal step for calibration is the selection of the network le
(Fig. 4, items #16e19) with inputs that best represent land use
change and an assessment of how well the model predicts across
different spatial scales. The network le with the weights, bias,
activation values are saved for the model with the inputs consid-
ered the best for the model application. If the model does not
perform adequately (Fig. 4, item #19), the user may consider other
input drivers or dropping drivers, which reduce model goodness of
t. However, if the drivers selected provide a positive contribution
to the goodness of t and the overall model is deemed adequate,
then this network le is saved and used in the next step, model
validation.
3.5. Validation
We follow the recommended procedures of Pontius et al.
(2004), and Pontius and Spencer (2005) to validate our model.
Briey, we use an independent data set across time to conduct an
historical forecast to compare a simulated map (Fig. 4, #15) with an
observed historical land use map that was not used to build the
ANN model (Fig. 4, #20). For example, below (Section 4.6), we
describe how we use a 2006 land use map that was not used to
build the model to compare to a simulated map. Validation metrics
(Fig. 4, #21) include the same as that used for calibration: namely,
PCM of the entire map or spatial unit, sensitivity, specicity, PCM
across window sizes, and error of quantity. It should be noted that
because we x the quantity of the land use class that changes be-
tween time 1 and time 2 for calibration, we do so for validation as
well (e.g., between time 2 and time 3, the number of cells that
changed in the observed maps are used to x the quantity of cells to
change in the simulation that forecasts time 3).
3.6. Forecasting
We designed the LTM-HPC so that the quantity model (Fig. 4,
#24) of the forecasting component can be executed for any spatial
unit category, like government units, watersheds or ecoregions, or
any spatial unit scale, such as states, counties or places. The
quantity model is developed ofine using Excel and algorithms that
relate a principle index driver (PID, see Pijanowski et al., 2002a)
that scales the amount of land use change (e.g., urban or crops) per
person. In the application described below, we execute the model at
several spatial unit scales e cities, states, and the lower 48 states.
Using a combination of unique unit IDs (e.g., federal information
processing systems (FIPS) codes are used for government unit IDs),
a le and directory-naming system, XML les, and python scripts,
the HPC was used to manage jobs and tasks organized by the
unique unit IDs.
We next use a program, written in C#, to convert probability
values to binary change values (0 are cells without change and 1 are
locations of change in prediction map) using input from the
quantity change model (Fig. 4, #24). The quantity change model
produces a table of the number of cells to grow for each time step
for each spatial unit froma CSV le. Rows in the CSV le contain the
unique unit IDS and the number of cells to transition for each time
step. The program reads the probability map for the spatial unit
(i.e., a particular city) being simulated, counts the number of cells
for each probability value and then sorts the values and counts by
rank. The original order is maintained using an index for each re-
cord. The probability values with high rank are then converted to
urban (code 1) until the numbers of newurban cells for each unit is
satised while other cells (code 0) remain without change. A
separate GIS map (Fig. 4, #25) may be created that would apply
additional exclusionary rules to create an alternative scenario.
Output from the model (Fig. 4, item #26) is used for planning or
natural resource management (Skole et al., 2002; Olson et al., 2008)
(Fig. 4, item #27), as input to other environmental models (e.g., Ray
et al., 2012; Wiley et al., 2010; Mishra et al., 2010; or Yang et al.,
2010) (Fig. 4, item #28) or the production of multimedia products
that can be ported to the internet (Fig. 4, item #29).
3.7. HPC job conguration
We developed a coding schema for the purposes of running the
simulation across multiple locations. We used a standard
numbering system from the Federal Information Processing Sys-
tems (FIPS) that is associated with states, counties and places. FIPS
is a hierarchical numbering system that assigns states a two-digit
code and a county in those states a three-digit code. A specic
county is thus given a ve-digit integer value (e.g., 18157 for Tip-
pecanoe County, Indiana) and places are given a seven-digit code;
two digits for the state and ve digits for the place (e.g., 1882862 for
the city of West Lafayette, Indiana).
Conguring HPC jobs and constructing the associated XML les
can be approached in different ways. The rst is to develop one job
and one XML le per model simulation component (e.g., mosaick-
ing individual census place spatial maps into a national map). For
our LTM-HPC application where we would need to mosaic over
20,000 census places, a job failure for any of the places would result
in the one large job stopping and then addressing the need to
resume the execution at the point of failure. A second approach,
used here, is to group tasks into numerous jobs where the number
of jobs and associated XML les is still manageable. A failure of one
census place would require less re-execution and trouble shooting
of that job. We often grouped the execution of census place tasks by
state, using the FIPS designator for both to assign names for input
and output les.
Five different jobs are part of the LTM-HPC (Fig. 7); those for
clipping a large le into smaller subsets, another for mosaicking
smaller les into one large le, one for controlling the calibration
programs, another job for creating forecast maps, and a fth for
controlling data transposing between ASCII at les and SNNS
pattern les. XML les are used by the HPC job manager to subdi-
vide the job into tasks; for example, our national simulation
described below at county and places levels is organized by state
and thus the job contains 48 tasks, one for each state. Fig. 7 is a
sample Windows jobs manager interface for mosaicking over
20,000 places. Each top line Fig. 7 (item#1) represents an XML for a
region (state) with the status (item #2). Core resources are shown
(Fig. 7, item #3). A tab (Fig. 7, item #4) displays the status of each
task (Fig. 7, item #5) within a job. We used a python script to create
each of the xml les although any programming or scripting lan-
guage can be used.
We then used an ArcGIS python script to mosaic the ASCII
maps; an XML le that lists le and path names was used as input
to the python script. Mosaicking and clipping are conducted in
ArcGIS using python scripts, polygon_clip.py and poly-
gon_mosaic.py. Both ArcGIS python scripts read the digital spatial
unit codes from a variable in the shape le attribute table and
names les based on the unit code. The resultant mosaicked
suitability map produced from training and data transposing
constitutes a map of the entire study domain. Creating such a
suitability map of the entire simulation domain allows us to (1)
import the ASCII le into ArcGIS in order to inspect and visualize
the suitability map; (2) allow the researcher to use different
subsetting and mosaicking spatial units (as we did below) and (3)
allow the researcher to forecast at different spatial units (we also
illustrate this below as well).
4. Execution of LTM-HPC
4.1. Hardware and software description
We executed the LTM-HPC on three computer systems (Fig. 8).
One computer, a high-end workstation, was used to process inputs
for the modeling using GIS. A windows cluster was used to
congure the LTM-HPC and all of the processing of about a dozen
steps occurred on this computer system. A third computer system
stored all of the data for the simulations. Specic conguration of
each computer system follows.
Data preparation was performed on a high-end, Windows 7
Enterprise 64-bit computer workstation equipped with 24 GB of
RAM, a 256 GB solid state hard drive, a 2 TB local hard drive, and
ArcGIS 10.0 with Spatial Analyst extension. Specic procedures
used to create each of the data layers for input to the LTM can be
found elsewhere (Pijanowski et al., 1997; Tayyebi et al., 2012).
Briey, data were processed for the entire contiguous United States
at 30m resolution, and distance to key features like roads, and
streams, were processed using the Euclidean Distance tool in Arc-
GIS setting all output to double precision integer given the large
size of each dataset; we limited the distance to 250 km. Once the
data were processed on the workstation, les were moved to the
storage server.
The hardware platform on which the parallelization was carried
out was a cluster of HPC consisting of ve nodes containing a total
of 20 cores. Windows Server HPC Edition 2008 was installed on the
HPCC. Each node was powered by a pair of dual core AMD Opteron
285 processors and 8 GB of RAM. Each machine had two 1 GB/s
network adapters with one used for cluster communication and the
other for external cluster communication. Each node had 74 GB of
hard drive space that was used for the operating system and soft-
ware, but was not used for modeling. The HPC cluster used for our
national LTM application consisted of one server (i.e. head node)
that controls other servers (i.e. compute nodes), which read and
write data from a data server. A cluster is the top-level unit, which
Fig. 7. Data structure, programs and les associated with training by the neural network. Item #1 represents an XML for a region (state) with the status (item #2). Core resources are
shown in item #3. Item #4 displays the status of each task (item #5) within a job.
is composed of nodes, or single physical or logical computers with
one or more cores that include one or more processors. All
modeling data was read and written to a storage machine located in
another building and transferred across an intranet with a
maximum of 1 Gigabit bandwidth.
The data storage server was composed of 24 two terabyte
7200 RPM drives in a RAID 6 conguration. This server also had
Windows 2008 Server R2 installed. Spot checks of resource moni-
toring showed that the HPC was not limited by network or disk
access and typically ran in bursts of 100% CPU utilization. ArcGIS
10.0 with the Spatial Analyst extension was installed on all servers.
Based on the results of the le number per folder and the use of
unique unit IDs as part of the le and directory-naming scheme, we
used a hierarchical directory structure as shown in Fig. 9. The upper
branches of the directory separate les into input and output di-
rectories, and subfolders store data by type (*.ASC or *.PAT les),
location, unit scale (national, state) and for forecasts, years and
scenarios.
Fig. 9. Directory structure for the LTM-HPC simulation.
Fig. 8. Computer systems involved in the LTM-HPC national simulations.
4.2. Preliminary tests
The primary limitation in le size comes from SNNS. This limit
was reached in the probability map creation phase in several
western U.S. counties when the *.RES le, which contains the values
for all of the drivers (e.g. distance to urban, etc.), crashed. To
overcome this issue, we divided the country into grids that pro-
duced les that SNNS was capable of handling for the steps up to
and including pattern le creation, which is done on a pixel-by-
pixel basis and is not spatially dependent. For organization and
performance reasons les were grouped into folders by state. As the
SNNS only uses the probability values in the projection phase, we
were able to project at the county level.
Early tests with mosaicking the entire country at once were
unsuccessful and led to mosaicking by state. The number of states
and years of projection for each state made populating the tool
elds in ArcGIS 10.0 Desktop a time intensive process. We used
python scripts to overcome this issue, and the HPC to process
multiple years and multiple states at the same time. Although it is
possible to run one mosaic operation for each core, we found that
running 24 operations on a machine led to corrupted mosaics. We
attribute this to the large le sizes and limited scratch space
(approximately 200 GB), and to overcome this problem, we limited
the number of operations per server by specifying each task to 6
cores for most states, and 12 cores for very large states such as CA
and TX.
4.3. Data preparation for national simulation
We used ArcGIS 10.0 and Spatial Analyst to prepare ve inputs
for use in training and testing of the neural network. Details of the
data preparation can be found elsewhere (Tayyebi et al., 2012)
although a brief description of processing and the les that were
created follow. We used the US Census 2000 road network line
work to create two road shape les: highways and main arterials.
We used ArcGIS 10.0 Spatial Analyst to calculate the distance that
each pixel was away from the nearest road. Other inputs included:
distance to previous urban (circa 1990), distance to rivers and
streams, distance to primary roads (highways), distance to sec-
ondary roads (roads), and slope.
Preparing data for neural net training required the following
steps. Land use data from approximately 1990 and 2000 were
collected from 18 different municipalities and 3 states. These data
were derived from aerial photography by local government and
were thus deemed to be of high quality. Original data were vector
and they were converted to raster using the simulation dimensions
described above. Data from states were used to select regions in
rural areas using a random site selection procedure (described in
Tayyebi et al., 2012).
Maps of public lands were obtained from ESRI Data Pack 2011
(ESRI, 2011). Public land shape les were merged with locations of
urban and open water in 1990 (using data from the USGS national
land cover database) and used to create the exclusionary layer for
the simulation. Areas that were not located within the training area
were set to no data in ArcGIS. Data from the US census bureau for
places is distributed as point location data. We used the point lo-
cations (the centroid of a town, city or village) to construct Thiessen
polygons representing the area closest to a particular urban center
(Fig. 10). Each place was labeled with the FIPS designated census
place value.
We executed the national LTM-HPC at three different spatial
scales and using two different kinds of spatial units (Tayyebi et al.,
2012): government and xed-size tiles. The three scales for our
government unit simulations were national, county and places
(cities, villages and towns).
All input maps were created at a national scale at 30m cell
resolution. For training, data were subset using ArcGIS on the local
computer workstation and pattern les created for training and
rst phase testing (i.e., calibration). We also used the LTM-clip.py
Python script to create subsamples for second phase testing. In-
puts and the exclusionary maps were clipped by census place and
then written out as *.ASC les. The createpat le was executed per
census place to convert the les from *.ASC to *.PAT.
4.4. Pattern recognition simulations for national model
We presented a training le with 284,477 cases (i.e., records or
locations) to the neural network using a feedforward, back propa-
gation algorithm. We followed the MSE during training saving this
value every 100 cycles. We found that the minimum MSE stabilized
globally at 49,500 cycles. The SNNS network le (*.NET le) was
produced every 100 cycles, so that we could analyze the training
later, but the network le for 49,500 cycles was saved and used to
estimate output (i.e., potential for a land use change to occur at each
location) for testing.
Testing occurred at the scale of tiles. The LTM-clip.py script was
used to create testing pattern les for each of the 634 tiles. The
ltm49500.nNET le was applied to each tile *.PAT le to create an
*.RES le for each tile. *.RES les contain estimates of the potential
for each location to change land use (values 0.0 to 1.0; where closer
Fig. 10. Spatial units involved in the LTM-HPC national simulation.
to 1.0 means higher chance of changing). The *.RES les are con-
verted to *.ASC les using a C program called convert2ascii.exe. The
*.ASC probability maps for all tiles were mosaicked to a national
raster le using an ArcGIS python script. All original values, which
range from 0.0 to 1.0, are multiplied by 100,000 by convert2ascii so
that they may be stored as double precision integer.
We used three-digit codes as unique numbers for naming tile
les and tracking them as tasks within states on HPC (634 grids
in conterminous of USA). Each tile contained a maximum of
4000 rows and 4000 columns of 30m pixels. We were able to do
this because the steps leading up to prediction work on a per
pixel basis and thus the processing unit did not affect the output
value.
4.5. Calibration of the national simulation
We trained on six neural network versions of the model: one
that contained ve input variables and ve that contained four
input variables each where we dropped out one input variable from
the full input model. We saved the MSE at each 100 cycles through
100,000 cycles and then calculated the percent difference of MSE
from the full input variable model (Fig. 11). Note that all of the
variables have a positive contribution to model goodness of t
during training; distance to highways provides the neural network
with the most information necessary for it to t input and output
data. This plot also illustrates how the neural network behaves;
between 0 cycles and approximately cycle 23,000 the neural
network makes large adjustments in weights, and values for acti-
vation function and biases. At one point, around 7000 cycles, the
model does better (i.e., percentage difference in MSE is negative)
without distance to streams as an input to the training data.
Eventually, all drop one out models stabilize near 50,000, which is
where the full ve-variable model also stabilizes. At this number of
training cycles, distance to highway contributes about 2% of the
goodness of t; distance to urban about 1.5%, slope about 1.2%, and
distance to road and distance to streams each about 0.7%. We
conclude from this drop one out calibration that (1) all ve
variables contribute in a positive way toward the goodness of t
and (2) that 49,500 cycles provide enough learning of the full ve-
variable model to use for validation.
The second step of calibration is to examine howwell the model
produces spatial maps of change compared to the observed data
(e.g., Fig. 5A). We use the locations of observed change from the
training map that are outside the training locations to create a
01234-coded calibration map. The XML_Clip_BASE HPC jobs le
was modied to receive the 01234-coded calibration map and
general statistics (e.g., percentage of each value) are created for the
entire simulation domain and for smaller subunits (e.g., spatial
units).
4.6. Validation of the national model
We used the 2006 NLCD urban map and a 2006 forecast map
from the LTM to create a 01234-coded validation map that was
assessed for goodness of t in several ways, two of which are
presented here. The rst goodness of t metric examined how
well the model predicted the correct number of urban cells per
simulation tile. This analysis was not computationally rigorous so
this assessment was performed on the single workstation. We
used ArcGIS 10.0 TabluateArea command in Spatial Analyst to
calculate the amount of area for each of the codes: 0, 1, 2 and 3.
The percentage of the amount of urban correctly predicted was
then mapped (Fig. 12A). Note that the model predicted the cor-
rect amount of urban cells in most simulation tiles. Only a few,
along coastal areas, contained errors in quantity of urban greater
than 5%.
The second goodness of t assessment highlights the use of the
HPC to calculate a computationally rigorous calculation that char-
acterizes location error. The XML_Clip_BASE jobs le was modied
to receive the 01234-coded validation map at the spatial unit of
tiles. The XML_Scaleable jobs le was used to execute the scaleable
window routine, for each tile, from a 3 3 window size through
101 101 windowsize. The percent correct metric was saved at the
10 10 window size (i.e., 3 km by 3 km) and PCM values merged
Fig. 11. Drop one out percent difference MSE from full driver model.
with the shape le for tiles. Note (Fig. 12B) that the model goodness
of t is best east of the Mississippi River, along the west coast and in
certain areas of the central United States where there are large
metropolitan cities (e.g., Denver). Improvement of the model thus
needs to concentrate on rural areas of the central and western
portions of the United States. Similar maps are often constructed
for different window sizes to determine if scale of prediction
changes spatially.
4.7. Forecasting
Forecasting requires merging the suitability map and the
quantity model. We used several XML jobs les to construct the
forecast maps at the national scale. We developed our quantity
model (Tayyebi et al., 2012) that contained the number of urban
cells to grow for each polygon for 10-year time steps from 2010 to
2060. We considered each state as a job and including all the
Fig. 12. Validation metrics of (A) quantity errors and (B) model goodness of t (PCM) for scaleable window size of 3 3 km.
polygons within the state as different tasks to create forecast maps
of each polygon. We embedded the path of prediction code and
number of urban cells to grow for each polygon within
XML_Pred_BASE job le. We ran XML_Pred_BASE job le for each
state on HPC to convert the probability map to forecast map for
each polygon. Then, we ran the Mosaic_Python script on the HPC
using XML_Pred_Mosaic_BASE to mosaic the prediction pieces at
the polygon level to create forecast maps at state level. Similarly, we
ran Mosaic_Python script on HPC using XML_Pred_Mosaic_Na-
tional to mosaic prediction pieces at state level to create a national
forecast map. HPC also enabled us to export error messages in error
les so that if any of tasks fail in a job using standard out and
standard error les to have records of program did during execu-
tion. We also embedded the path of standard out and standard
error les in the tasks of the XML jobs le.
We generated decadal maps of land use from2010 through 2050
from this simulation. Maps of new urban (red) superimposed on
2006 land use/cover from the USGS National Land Cover Maps for
eight regions are shown in Fig. 13. Note that the model produces
different spatial patterns of urbanization depending on the loca-
tion; urbanization in the Los AngeleseSan Diego region are more
clumped, likely to due to topographic limitations of the area in the
large metropolitan area. Dispersed urbanization is characteristic of
at areas like Florida, Atlanta and the Northeast.
5. Discussion
We presented an overview of the conversion of a single work-
station land change model that has been converted to operate using
a high performance computer cluster. The Land Transformation
Model was originally developed to simulate small areas (Pijanowski
et al., 2000, 2002b), such as watersheds. However, there is a need
for larger sized land change models especially those that can be
coupled to large-scale process models, such as climate change (cf.
Olson et al., 2008, Pijanowski et al., 2011), and dynamic hydrologic
models (Yang et al., 2010; Mishra et al., 2010). We have argued that
to accomplish the goal of increasing the size of the simulation,
several challenges had to be overcome. These included (1) pro-
cessing of large databases; (2) the management of large numbers of
les; (3) the need for a high-level architecture that integrates
model components; (4) error checking and (5) the management of
multiple job executions. Here we briey discuss how we addressed
these challenges as well as lessons learned in porting the original
LTM to an HPC environment.
5.1. Challenges of executing large-scale models
We found that the large datasets used for input and output were
difcult to manage successfully within ArcGIS 10.0. The les had to
be managed as smaller subsets, either as states or regions (i.e.,
multiple states), and in the case of Texas, we had to manage this
state as separate counties. Programs written in C# had to read and
write lines of data at a time rather than read large les into a large
array. This is needed despite a large amount of memory contained
in the HPC.
The large number of les were managed using a standard le
naming coding system and hierarchical arrangement of folders on
our storage server. The coding system also helped us to construct
the xml le content used by the job manager in Windows 2008
Server R2.
The high-level architecture was designed after the proper steps
that have been outlined by prominent land change modeling sci-
entists (Pontius et al., 2004, Pontius et al., 2008). These include
steps for (1) data sampling from input les; (2) training; (3) cali-
bration; (4) validation and (5) application. Job les were
constructed for steps the interfaced each of these modeling steps.
In fact, we found quickly discovered that the most logical directory
structure mirrored the high-level architecture of the model.
We experienced that jobs or tasks can fail because of one of the
following errors: (1) one or more tasks in the job have failed. This
indicates that one or more tasks could not be run or did not com-
plete successfully. We specied standard output and standard error
les in the job description to determine which executable les fail
during execution. (2) A node assigned to the job or task could not be
contacted. Jobs or tasks that fail because of a node falling out of
contact are automatically retried a certain number of times, but will
eventually fail if the problemcontinues. (3) The run time for a job or
task run expired. The job scheduler service cancels jobs or tasks
that reach the end of their run time. (4) A le location required by
the job or task could not be accessed. A frequent cause of task
failures is inaccessibility of required le locations, including the
standard input, output, and error les and the working directory
locations.
5.2. Lessons learned from converting the LTM to an HPC
The limited number of probability maps created in our simula-
tion meant that simple folder structure were only needed which
made it easy to mosaic manually. However, the prediction output
was stored by state and by year, which made mosaicking a time
consuming and an error prone process; in some cases, we needed
to manually mosaic a few areas as the job manager would crash.
The HPC was employed to speed up the mosaicking process but this
was not a fail-safe process. A short python script that ran the ArcGIS
mosaic raster tool was the heart of the process. The 9000 network
les of LTM-HPC generated from training run were applied to each
pattern le derived from boxes that contained all of the cells in the
USA except those within the exclusionary zone. Finally, states were
manually mosaicked to create the national probability map for USA
(Fig. 13).
A windows HPC cluster was used to decrease the time required
to process the data by running the model on multiple spatial units
simultaneously. The time required to run LTM can be thought of as
the time it would take to run the LTM-HPC serially. When running
the LTM-HPC, the amount of time required relative to LTM is
approximately halved for every doubling of cores. This variance in
processing time is caused by variance in le size. The HPC also
provides additional benets to researchers who are interested in
running large-scale models. These include the reduction in the
need for human control of various steps, which thereby reduces the
changes of human error. It also allows researchers to execute the
model in a variety of congurations (e.g., here we were able to run
the model using different spatial units testing issues related to
scale), allowing for researchers to run ensembles.
We also found that developing and executing the model across
three computer systems (data storage, data processing and coding,
and simulation) worked well. Delegating tasks to each of these
helped to manage work ow and optimize the purpose of each
computer system.
5.3. Needs for land change model forecasts at large extents and ne
resolution
Models that must simulate large areas at ne resolutions and
produce output that has multiple time steps require the handling
and management of big data. Environmental simulations have
traditionally focused on small spatial extents at ne resolutions to
produce the required output. However, environmental problems
are often at large extents and coarse resolution simulations, or
alternatively, at small extents and ne resolutions, may hinder the
ability to assess impacts at the necessary scale. Land change models
are often used to assess how human use of the land may impact
ecosystem health. It is well known that land use cover change
impacts ecosystem processes at a variety of spatial scales (Reid
et al., 2010; GLP, 2005; Lambin and Geist, 2006). Some of the
most frequently cited ecosystem impacts include how land use
change at large extents affect the total amount of carbon
sequestered in aboveground plants and soils in a region (e.g., Dixon
et al., 1994; Post and Kwon, 2000; Cox et al., 2000; Vleeshouwers
and Verhagen, 2002; Guo and Gifford, 2002), how patterns and
amounts of certain land covers (e.g., forests, urban) affect invasive
species spread and distributions (e.g., Sharov et al., 1999; Fei et al.,
2008), how land surface properties feedback to the atmosphere
through alterations of water and energy uxes (e.g., Dale, 1997;
Fig. 13. LTM 2050 urban change forecasts for different regions. (For interpretation of the references to color in this gure legend, the reader is referred to the web version of this
article.)
Pielke, 2005; Bonan, 2008; Pijanowski et al., 2011), how certain
land uses, such as urban and agriculture, increase nutrients and
pollutants to surface and ground water bodies (Pijanowski et al.,
2002b; Tang et al., 2005a,b) and how land use patterns affect
biodiversity of terrestrial (Pekin and Pijanowski, 2012) and aquatic
ecosystems, such as freshwater sh, organisms (Wiley et al., 2010).
In all cases, more urban decreases ecosystem health (cf. Pickett
et al., 1997; Reid et al., 2010; Grimm and Redman, 2004; Kaye
et al., 2006).
Assessment of land use change impacts has often occurred by
coupling land change models to other environmental models. For
example, the LTM has been coupled to the Regional Atmospheric
Modeling Systems (RAMS) to assess how land use change might
impact precipitation patterns at subcontinental scales in East Africa
(Moore et al., 2010), coupled to the Variable Impact Calculator (VIC)
model in the Great Lakes basin (Yang, 2011) to the Long-Term Hy-
drologic Impact Assessment (L-THIA) model assess how land use
change might impact overland ow patterns in large regional wa-
tersheds and hownutrient uxes and pollutants fromurban change
would impact stream ecosystem health in large watersheds (Tang
et al., 2005a,b). The next step in our development will be to
couple the output of this model to a variety of environmental
impact models that are spatially explicit. We intend to conduct that
work in the HPC environment using the principles that we outline
above.
The LTM-HPC model presented here can also be modied to
address other land change transitions. For example, the LTM-
HPC can be congured to simulate multiple transitions at a
time; this might include the loss of urban along with urban gain
(which is simulated here), or include the numerous land tran-
sitions common to many areas of the world; namely, the loss of
natural lands like forests to agriculture, the shift of agriculture
to urban, the loss of forests to urban and the transition of
recently disturbance areas (e.g., shrubland) to more mature
natural lands like forests. To accomplish multiple transitions, a
variety of rules need to be explored further to determine how
they would be applied to the model. It is also quite possible that,
such a large may be heterogeneous; several transition rules may
need to be applied in the same simulation with rules applied to
areas based on another higher-level rule. The LTM-HPC could
also be congured to simulate subclasses of land use following
Dietzel and Clarke (2006). For example, within the urban class,
parking lots in the United States cover large extents but are
relatively small areas (Davis et al., 2010a,b); such an application
could require the LTM-HPC because ne resolutions would be
needed. Likewise, simulating crop cover types annually at a
national scale (cf. Plourde et al., 2013) could product a consid-
erable amount of temporal information. At a global scale, we
have found that subclasses of land use/cover inuence species
diversity patterns especially those vertebrates that are rare and
of a threatened category (Pekin and Pijanowski, 2012) and so
global scale simulations are likely to need models like LTM-HPC.
The LTM-HPC could also support national or regional scale
environmental programmatic assessments that are becoming more
common, supported by national government agencies. These
include the 2013 United States National Climate Assessment Pro-
gram (USGCRP, 2013), National Ecological Observation Network
(NEON) supported in the United States by the National Science
Foundation (Schimel et al., 2007; Kampe et al., 2010), and the Great
Lakes Restoration Initiative which seeks to develop State of the Lake
Ecosystem metrics of ecosystem services (SOLEC; Bertram et al.,
2003; WHCEC, 2010). In Europe, an EU15 set of land use forecasts
have been used extensively to study the impacts of land use and
climate change on this continents ecosystem services (cf.
Rounsevell et al., 2006).
5.4. Calibration and validation of big data simulations
We presented a preliminary assessment of the model goodness
of t for the LTM-HPC simulations (e.g. Fig. 12). A rigorous assess-
ment would require more effort placed on (1) ne resolution ac-
curacy; (2) a quantication of the variability of ne resolution
accuracy across the entire simulation; (3) errors associated with
forecasting (i.e., temporal measures of model goodness of t); (4)
the relative cost of an error (i.e., whether an error of location is
important to the application); and measures of data input quality.
We were able to show that, at 3 km scales, the error of location
varied considerably across the simulation domain. Errors were
greater in the eastern portion of the United States for quantity.
Patterns of error were different from quantity; they were lower in
the eastern for quantity (Fig. 12). Location of errors could be
important too if they affect the policies or outcomes of environ-
mental assessment. If policies are being explored to determine the
impact of land use change in stream riparian zones, model location
accuracy needs to be good along streams. If environmental impacts
are being assessed, then covariates such as soil, which tends to be
spatially heterogeneous, needs to be taken into consideration.
Current model goodness of t metrics have not been designed to
consider large, big data simulations such as the one presented here;
thus more research in this area is needed to make a full assessment
of how well a model like this performs.
6. Conclusions
This paper presents the application of the LTM-HPC at multi-
scale using quantity drivers (a ne-scale urban land use change
model applied across the conterminous of USA) and introduces a
newversion of LTMwith substantially augmented functionality. We
described a parallel implementation of the data and modeling
process on a cluster of multi-core processors using HPC as a data-
parallel programming framework. We focus on efciently
handling the challenges raised by the nature of large datasets and
show how they can be addressed effectively within the computa-
tion framework by optimizing the computation to adapt to the
nature of the data. We signicantly enhance the training and
testing run of the LTM, and enable application of the model for
region scale such as continent. Future research will also be able to
use the new information generated by the LTM-HPC to address
questions related to how urban patterns relate to the process of
urban land use change. Because we were able to preserve the high-
resolution of the land use data (30mresolution), LTM-HPC provided
the capability of visualizing alternative future scenarios at a
detailed scale, which helped to engage urban planner in the sce-
nario development process. We believe this project represents an
important advancement in computational modeling of urban
growth patterns. In terms of simulation modeling, we have pre-
sented several newadvancements in the LTM models performance
and capabilities. More importantly, however, this project repre-
sents a successful broad-scale modeling framework that has direct
applications to land use management.
Finally, we found that the LTM-HPC has some signicant ad-
vantages over the single workstation version of the LTM. These
include:
(1) Automated data preparation: data can now be clipped and
converted to ASCII format automatically at the state, county
or any other division using unique identity for the unit in
Python environment;
(2) Better memory usage: The source code for the model in C#
environment has been changed making calculations per-
formed by LTM-HPC completely independent fromthe size of
the ASCII les by reading each code line separately into an
array using the C# environment;
(3) Ability to conduct simultaneous analyses: LTM was not
designed to be used for different regions at the same time.
LTM-HPC now uses a unique code for different regions in
XML format and can repeat all the processes simultaneously
for different regions;
(4) Increased processing speed: The previous version of LTM had
many disconnected steps (Pijanowski et al., 2002a) which
were carried out sequentially using different DOS-level
commands. All XML les are now uploaded into an HPC
environment and all modeling steps are automatically
processed.
References
Adeloye, A.J., Rustum, R., Kariyama, I.D., 2012. Neural computing modeling of the
reference crop evapotranspiration. Environ. Model. Softw. 29, 61e73.
Anselme, B., Bousquet, F., Lyet, A., Etienne, M., Fady, B., Le Page, C., 2010. Modeling of
spatial dynamics and biodiversity conservation on Lure mountain (France).
Environ. Model. Softw. 25 (11), 1385e1398.
Bennett, N.D., Croke, B.F.W., Guariso, G., Guillaume, J.H.A., Hamilton, S.H.,
Jakeman, A.J., Marsili-Libelli, S., Newhama, L.T.H., Norton, J.P., Perrin, C.,
Pierce, S.A., Robson, B., Seppelt, R., Voinov, A.A., Fath, B.D., Andreassian, V., 2013.
Environ. Model. Softw. 40, 1e20.
Bertram, P., Stadler-Salt, N., Horvatin, P., Shear, H., 2003. Bi-national assessment of
the Great Lakes: SOLEC partnerships. Environ. Monit. Assess. 81 (1e3), 27e33.
Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford University
Press, Oxford.
Bishop, C.M., 2005. Neural Networks for Pattern Recognition. Oxford University
Press, ISBN 0-19-853864-2.
Bonan, G.B., 2008. Forests and climate change: forcings, feedbacks, and the climate
benets of forests. Science 320 (5882), 1444e1449.
Boutt, D.F., Hyndman, D.W., Pijanowski, B.C., Long, D.T., 2001. Identifying potential
land use-derived solute sources to stream baseow using ground water models
and GIS. Ground Water 39 (1), 24e34.
Burton, A., Kilsby, C., Fowler, H., Cowpertwait, P., OConnell, P., 2008. RainSim: a
spatial-temporal stochastic rainfall modeling system. Environ. Model. Softw. 23
(12), 1356e1369.
Buyya, R. (Ed.), 1999. High Performance Cluster Computing: Architectures and
Systems, vol. 1. Prentice Hall, Englewood Cliffs, NJ.
Carpani, M., Bergez, J.E., Monod, H., 2012. Sensitivity analysis of a hierarchical
qualitative model for sustainability assessment of cropping systems. Environ.
Model. Softw. 27e28, 15e22.
Chapman, T., 1998. Stochastic modelling of daily rainfall: the impact of adjoining
wet days on the distribution of rainfall amounts. Environ. Model. Softw. 13 (3e
4), 317e324.
Cheung, A.L., Reeves, Anthony P., 1992. High performance computing on a cluster of
workstations. HPDC 1992, 152e160.
Clarke, K.C., Gazulis, N., Dietzel, C., Goldstein, N.C., 2007. A decade of
SLEUTHing: lessons learned from applications of a cellular automaton
land use change model. In: Classics from IJGIS: Twenty Years of the
International Journal of Geographical Information Systems and Science,
pp. 413e425.
Cox, P.M., Betts, R.A., Jones, C.D., Spall, S.A., Totterdell, I.J., 2000. Acceleration of
global warming due to carbon-cycle feedbacks in a coupled climate model.
Nature 408 (6809), 184e187.
Dale, V.H., 1997. The relationship between land-use change and climate change.
Ecol. Appl. 7 (3), 753e769.
Davis, A.Y., Pijanowski, B.C., Robinson, K.D., Kidwell, P.B., 2010a. Estimating parkinglot
footprints in the Upper Great Lakes region of the USA. Landsc. Urban Plan. 96 (2),
68e77.
Davis, A.Y., Pijanowski, B.C., Robinson, K., Engel, B., 2010b. The environmental and
economic costs of sprawling parking lots in the United States. Land Use Policy
27 (2), 255e261.
Denoeux, T., Lengell, 1993. Initializing back propagation networks with prototypes.
Neural Netw. 6, 351e363.
Dietzel, C., Clarke, K., 2006. The effect of disaggregating land use categories in
cellular automata during model calibration and forecasting. Comput. Environ.
Urban Syst. 30 (1), 78e101.
Dietzel, C., Clarke, K.C., 2007. Toward optimal calibration of the SLEUTH land use
change model. Trans. GIS 11 (1), 29e45.
Dixon, R.K., Winjum, J.K., Andrasko, K.J., Lee, J.J., Schroeder, P.E., 1994. Integrated
land-use systems: assessment of promising agroforest and alternative land-use
practices to enhance carbon conservation and sequestration. Clim. Change 27
(1), 71e92.
Dlamini, W., 2008. A Bayesian belief network analysis of factors inuencing wildre
occurrence in Swaziland. Environ. Model. Softw. 25 (2), 199e208.
ESRI, 2011. ArcGIS 10. Software.
Fei, S., Kong, N., Stinger, J., Bowker, D., 2008. In: Ravinder, K., Jose, S., Singh, H.,
Batish, D. (Eds.), Invasion Pattern of Exotic Plants in Forest Ecosystems. Invasive
Plants and Forest Ecosystems. CRC Press, Boca Raton, FL, pp. 59e70.
Fitzpatrick, M., Long, D., Pijanowski, B., 2007. Biogeochemical ngerprints of land
use in a regional watershed. Appl. Biogeochem. 22, 1825e1840.
Foster, I., Kesselman, C., 1997. Globus: a metacomputing infrastructure toolkit. Int. J.
Supercomput. Appl. 11 (2), 115e128.
Foster, D.R., Hall, B., Barry, S., Clayden, S., Parshall, T., 2002. Cultural, environmental,
and historical controls of vegetation patterns and the modern conservation
setting on the island of Marthas Vineyard, U.S.A. J. Biogeogr. 29, 1381e1400.
GLP, 2005. Science Plan and Implementation Strategy. IGBP Report No. 53/IHDP
Report No. 19. IGBP Secretariat, Stockholm, 64 pp.
Grimm, N.B., Redman, C.L., 2004. Approaches to the study of urban ecosystems: the
case of Central ArizonadPhoenix. Urban Ecosyst. 7 (3), 199e213.
Guo, L.B., Gifford, R.M., 2002. Soil carbon stocks and land use change: a meta
analysis. Glob. Change Biol. 8 (4), 345e360.
Herold, M., Goldstein, N.C., Clarke, K.C., 2003. The spatiotemporal form of urban
growth: measurement, analysis and modeling. Remote Sens. Environ. 86 (3),
286e302.
Herold, M., Couclelis, H., Clarke, K.C., 2005. The role of spatial metrics in the analysis
and modeling of urban land use change. Comput. Environ. Urban Syst. 29 (4),
369e399.
Hey, A.J., 2009. The Fourth Paradigm: Data-intensive Scientic Discovery.
Jacobs, A., 2009. The pathologies of big data. Commun. ACM 52 (8), 36e44.
Kampe, T.U., Johnson, B.R., Kuester, M., Keller, M., 2010. NEON: the rst continental-
scale ecological observatory with airborne remote sensing of vegetation canopy
biochemistry and structure. J. Appl. Remote Sens. 4 (1), 043510e043510.
Kaye, J.P., Groffman, P.M., Grimm, N.B., Baker, L.A., Pouyat, R.V., 2006. A distinct
urban biogeochemistry? Trends Ecol. Evol. 21 (4), 192e199.
Kilsby, C., Jones, P., Burton, A., Ford, A., Fowler, H., Harpham, C., James, P., Smith, A.,
Wilby, R., 2007. A daily weather generator for use in climate change studies.
Environ. Model. Softw. 22 (12), 1705e1719.
Lagabrielle, E., Botta, A., Dar, W., David, D., Aubert, S., Fabricius, C., 2010. Modeling
with stakeholders to integrate biodiversity into land-use planning, lessons
learned in Runion Island (Western Indian Ocean). Environ. Model. Softw. 25
(11), 1413e1427.
Lambin, E.F., Geist, H.J. (Eds.), 2006. Land Use and Land Cover Change: Local Pro-
cesses and Global Impacts. Springer.
LaValle, S., Lesser, E., Shockley, R., Hopkins, M.S., Kruschwitz, N., 2011. Big data,
analytics and the path from insights to value. MIT Sloan Manag. Rev. 52 (2),
21e32.
Lei, Z., Pijanowski, B.C., Alexandridis, K.T., Olson, J., 2005. Distributed modeling
architecture of a multi-agent-based behavioral economic landscape (MABEL)
model. Simulation 81 (7), 503e515.
Loepfe, L., Martnez-Vilalta, J., Piol, J., 2011. An integrative model of human
inuenced re regimes and landscape dynamics. Environ. Model. Softw. 26 (8),
1028e1040.
Lynch, C., 2008. Big data: how do your data grow? Nature 455 (7209), 28e29.
Mas, J.F., Puig, H., Palacio, J.L., Sosa, A.A., 2004. Modeling deforestation using GIS and
articial neural networks. Environ. Model. Softw. 19 (5), 461e471.
MEA, Millennium Ecosystem Assessment, 2005. Ecosystems and Human Well-
being: Current State and Trends. Island Press, Washington, DC.
Merritt, W.S., Letcher, R.A., Jakeman, A.J., 2003. A review of erosion and sediment
transport models. Environ. Model. Softw. 18 (8e9), 761e799.
Mishra, V., Cherkauer, K., Niyogi, D., Ming, L., Pijanowski, B., Ray, D., Bowling, L.,
2010. Regional scale assessment of land use/land cover and climatic changes on
surface hydrologic processes. Int. J. Climatol. 30, 2025e2044.
Moore, N., Torbick, N., Lofgren, B., Wang, J., Pijanowski, B., Andresen, J., Kim, D.,
Olson, J., 2010. Adapting MODIS-derived LAI and fractional cover into the
Regional Atmospheric Modeling System (RAMS) in East Africa. Int. J. Climatol.
30 (3), 1954e1969.
Moore, N., Alargaswamy, G., Pijanowski, B., Thornton, P., Lofgren, B., Olson, J.,
Andresen, J., Yanda, P., Qi, J., 2011. East African food security as inuenced by
future climate change and land use change at local to regional scales. Clim.
Change. http://dx.doi.org/10.1007/s10584-011-0116-7.
Olson, J., Alagarswamy, G., Andresen, J., Campbell, D., Davis, A., Ge, J., Huebner, M.,
Lofgren, B., Lusch, D., Moore, N., Pijanowski, B., Qi, J., Thornton, P., Torbick, N.,
Wang, J., 2008. Integrating diverse methods to understand climate-land in-
teractions in east Africa. GeoForum 39 (2), 898e911.
Pekin, B.K., Pijanowski, B.C., 2012. Global land use intensity and the endangerment
status of mammal species. Divers. Distrib. 18 (9), 909e918.
Peralta, J., Li, X., Gutierrez, G., Sanchis, A., 2010 July. Time series forecasting by
evolving articial neural networks using genetic algorithms and differential
evolution. Neural Netw. e IJCNN, 1e8.
Prez-Vega, A., Mas, J.F., Ligmann, A., 2012. Comparing two approaches to land use/
cover change modeling and their implications for the assessment of biodiver-
sity loss in a deciduous tropical forest. Environ. Model. Softw. 29, 11e23.
Pickett, S.T., Burch Jr., W.R., Dalton, S.E., Foresman, T.W., Grove, J.M., Rowntree, R.,
1997. A conceptual framework for the study of human ecosystems in urban
areas. Urban Ecosyst. 1 (4), 185e199.
Pielke, R.A., 2005. Land use and climate change. Science 310 (5754), 1625e1626.
Pijanowski, B.C., Long, D.T., Gage, S.H., Cooper, W.E., 1997, June. A Land Trans-
formation Model: Conceptual Elements, Spatial Object Class Hierarchies, GIS
Command Syntax and an Application for Michigans Saginaw Bay Watershed. In
Submitted to the Land Use Modeling Workshop. USGS EROS Data Center.
Pijanowski, B.C., Gage, S.H., Long, D.T., 2000. A land transformation model: inte-
grating policy, socioeconomic and environmental drivers using a geographic
information system. In: Sanderson, J., Harris, L. (Eds.), Landscape Ecology: a Top
Down Approach. CRC Press, Lewis Publisher, Boca-Raton.
Pijanowski, B.C., Daniel, G., Brown, Shellito, Bradley A., Manik, Gaurav A., 2002a.
Using neural networks and GIS to forecast land use changes: a land trans-
formation model. Comput. Environ. Urban Syst. 26, 553e575.
Pijanowski, B.C., Shellito, B., Pithadia, S., Alexandridis, K., 2002b. Forecasting and
assessing the impact of urban sprawl in coastal watersheds along eastern Lake
Michigan. Lakes Reserv. Res. Manag. 7, 271e285.
Pijanowski, B.C., Pithadia, S., Shellito, B.A., Alexandridis, K., 2005. Calibrating a
neural network based urban change model for two metropolitan areas of Upper
Midwest of the United States. Int. J. Geogr. Inf. Sci. 19, 197e215.
Pijanowski, B.C., Alexandridis, K., Mueller, D., 2006. Modeling urbanization patterns
in two diverse regions of the world. J. Land Use Sci. (1), 83e108.
Pijanowski, B., Ray, D.K., Kendall, A.D., Duckles, J.M., Hyndman, D.W., 2007. Using
back-cast land-use change and groundwater travel time models to generate
land-use legacy maps for watershed management. Ecol. Soc. 12 (2), 25 [online]
URL: http://www.ecologyandsociety.org/vol12/iss2/art25/.
Pijanowski, B.C., Tayyebi, A., Delavar, M.R., Yazdanpanah, M.J., 2009. Urban expan-
sion simulation using geographic information systems and articial neural
networks. Int. J. Environ. Res. 3 (4), 493e502.
Pijanowski, B.C., Moore, N., Mauree, D., Niyogi, D., 2011. Evaluating error propaga-
tion in coupled landeatmosphere models. Earth Interact. 15, 1e25.
Plourde, J.D., Pijanowski, B.C., Pekin, B.K., 2013. Evidence for increased monoculture
cropping in the Central United States. Agric. Ecosyst. Environ. 165, 50e59.
Pontius, R.G., Petrova, S., 2010. Assessing a predictive model of land change using
uncertain data. Environ. Model. Softw. 25 (3), 299e309.
Pontius Jr., R.G., Spencer, J., 2005. Uncertainty in extrapolations of predictive land
change models. Environ. Plan. B 32, 211e230.
Pontius Jr., R.G., Huffaker, D., Denman, K., 2004. Useful techniques of validation for
spatially explicit land-change models. Ecol. Model. 179 (4), 445e461.
Pontius Jr., R.G., Boersma, W., Castella, J.-C., Clarke, K., de Nijs, T., Dietzel, C., Duan, Z.,
Fotsing, E., Goldstein, N., Kok, K., Koomen, E., Lippitt, C.D., McConnell, W., Mohd
Sood, A., Pijanowski, B., Pithadia, S., Sweeney, S., Trung, T.N., Veldkamp, A.T.,
Verburg, P.H., 2008. Comparing input, output, and validation maps for several
models of land change. Ann. Reg. Sci. 42 (1), 11e47.
Post, W.M., Kwon, K.C., 2000. Soil carbon sequestration and land-use change:
processes and potential. Glob. Change Biol. 6 (3), 317e327.
Randerson, J.,T., Hoffman, F.M., Thornton, P.E., Mahowald, N.M., Lindsay, K., Lee, Y.-
H., Nevison, C.D., Doney, S.C., Bonan, G., Stckli, R., Covey, C., Running, S.W.,
Fung, I.Y., Oct 2009. Systematic assessment of terrestrial biogeochemistry in
coupled climate-carbon models. Global Change Biol.. ISSN: 1365-2486 15 (10),
2462e2484. http://dx.doi.org/10.1111/j.1365-2486.2009.01912.x.
Ray, D.K., Pijanowski, B.C., 2010. A backcast land use change model to generate past
land use maps: application and validation at the Muskegon river watershed of
Michigan, USA. J. Land Use Sci. 5 (1), 1e29.
Ray, D.K., Duckles, J., Pijanowski, B.C., 2011. The impact of future land use scenarios on
runoff volumes intheMuskegonRiver Watershed. Environ. Manag. 46(3), 351e366.
Ray, D.K., Pijanowski, B.C., Kendall, A.D., Hyndman, D.W., 2012. Coupling land use
and groundwater models to map land use legacies: assessment of model un-
certainties relevant to land use planning. Appl. Geogr. 34 (2012), 356e370.
Reid, W.V., Chen, D., Goldfarb, L., Hackmann, H., Lee, Y.T., Mokhele, K., Whyte, A.,
2010. Earth system science for global sustainability: grand challenges. Scien-
ce(Washington) 330 (6006), 916e917.
Reinefeld, A., Lindenstruth, V., Sept 2001. Howto build a high-performance compute
cluster for the grid. In: International Workshop on Metacomputing Systems and
Applications, MSA2001. IEEE Computer Society Press, pp. 221e227.
Rounsevell, M.D.A., Reginster, I., Arajo, M.B., Carter, T.R., Dendoncker, N., Ewert, F.,
Tuck, G., 2006. A coherent set of future land use change scenarios for Europe.
Agric. Ecosyst. Environ. 114 (1), 57e68.
Schimel, D., Hargrove, W., Hoffman, F., MacMahon, J., 2007. NEON: a hierarchically
designed national ecological network. Front. Ecol. Environ. 5 (2), 59e59.
Sharov, A.A., Pijanowski, B.C., Liebhold, A.M., Gage, S.H., 1999. What affects the rate
of gypsy moth (Lepidoptera: Lymantriidae) spread: winter temperature or
forest susceptibility? Agric. For. Entomol. 1 (1), 37e45.
Shellito, B.A., Pijanowski, B.C., 2003. Using neural nets to model the spatial distri-
bution of seasonal homes. Cartogr. Geogr. Inf. Sc. 30 (3), 281e290.
Skole, D., Batzli, S., Gage, S., Pijanowski, B., Chomentowski, W., Rustem, W., 2002.
Forecast Michigan: Tracking Change for Land Use Planning and Policy Mak-
ing. Informing the Debate: Urban Housing and Land Development. Institute
for Public Policy and Social Research, Michigan State University, East Lansing,
p. 30.
Tang, Z., Engel, B., Lim, K., Pijanowski, B., Harbor, J., 2005a. Minimizing the
impact of urbanization on long-term runoff. J. Water Resour. Assoc. 41 (6),
1347e1359.
Tang, Z., Engel, B.A., Pijanowski, B.C., Lim, K.J., 2005b. Forecasting land use
change and its environmental impact at a watershed scale. J. Environ.
Manag. 76, 35e45.
Tayyebi, A., Perry, P., 2013. Predicting the expansion of an urban boundary using
spatial logistic regression and hybrid raster-vector routines with remote sensing
and GIS. Int. J. Geogr. Inf. Sci.. http://dx.doi.org/10.1080/13658816.2013.845892.
Tayyebi, A., Delavar, M.R., Yazdanpanah, M.J., Pijanowski, B.C., Saeedi, S.,
Tayyebi, A.H., 2010. A Spatial Logistic Regression Model for Simulating Land
Use Patterns: a Case Study of the Shiraz Metropolitan Area of Iran. In:
Advances in Earth Observation of Global Change. Springer, Netherlands,
pp. 27e42.
Tayyebi, A., Pekin, B.K., Pijanowski, B.C., Plourde, J.D., Doucette, J.S., Braun, D., 2012.
Hierarchical modeling of urban growth across the conterminous USA: devel-
oping meso-scale quantity drivers for the Land Transformation Model. J. Land
Use Sci., 1e21 (Ahead-of-print).
Turner, B.L.I., Matson, P.A., McCarthy, J., 2003. Illustrating the coupled human-
environment system for vulnerability analysis: three case studies. Proc. Natl.
Acad. Sci. 100, 8080e8085.
USGCRP (United States Global Change Research Program), 2013. National Change
Assessment Draft Report. http://ncadac.globalchange.gov/ (accessed 05.04.13.).
Veldkamp, A., Lambin, E.F., 2001. Editorial: predicting land-use change. Agric.
Ecosyst. Environ. 85, 1e6.
Vleeshouwers, L.M., Verhagen, A., 2002. Carbon emission and sequestration by
agricultural land use: a model study for Europe. Glob. Change Biol. 8 (6),
519e530.
Wayland, K., Long, D., Hyndman, D., Pijanowski, B., Haack, S., 2002. Modeling the
impact of historical land uses on surface water quality using ground water ow
and solute transport models. Lakes Reserv. 7, 189e199.
WHCEC (White House Council on Environmental Quality), 2010. Great Lakes Resto-
ration Initiative Action Plan. Greatlakesrestoration.us/pdfsglri_actionaplan.pdf
(last accessed 05.04.13.).
Wiley, M., Hyndman, D., Pijanowski, B., Kendall, A., Riseng, C., Rutherford, E.,
Cheng, S., Carlson, M., Richards, R., Seelbach, R., Koches, J., 2010. A multi-
modeling approach to evaluate the impacts of global change on river ecosys-
tems. Hydrobiologia 657, 243e262.
Xue, Y., Alves, O., Balmaseda, M.A., Ferry, N., Good, S., Ishikaw, I., Lee, T.,
McPhaden, M.J., Peterson, K.A., Rienecker, M., 2010. Ocean state estimation for
global ocean monitoring: ENSO and beyond ENSO. In: Hall, J., Harrison, D.E.,
Stammer, D. (Eds.), Proceedings of OceanObs09: Sustained Ocean Observations
and Information for Society, vol. 2. ESA Publication WPP-306. Venice, Italy, 21e
25 September 2009.
Yang, J., 2011. Convergence and uncertainty analyses in Monte-Carlo based sensi-
tivity analysis. Environ. Model. Softw. 26 (4), 444e457.
Yang, G., Bowling, L., Cherkauer, K., Pijanowski, B., 2010. Hydrologic response of
watersheds to urbanization in the White River basin, Indiana. J. Hydrometeorol.
11, 122e138.
Washington-Ottombre, C., Pijanowski, B., Campbell, D., Olson, J., Maitima, J.,
Musili, A., Mwangi, A., 2010. Using a role-playing game to inform the devel-
opment of land-use models for the study of a complex socio-ecological system.
Ag. Syst. 103 (3), 117e126.

A Big Data Urban Growth Big Datasimulation at A National Scale

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Big Data Urban Growth Big Datasimulation at A National Scale

Загружено:

Авторское право:

Доступные форматы

A big data urban growth simulation at a national scale: Conguring

Вам также может понравиться