Вы находитесь на странице: 1из 20

Justin Dee Implementing TopModel in Nova B.Sc.

Computer Science st 31 March 2013

Justin Dee

I certify that the material contained in this dissertation is my own work and does not contain unreferenced or unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all associated documentation. Regarding the electronically submitted version of this submitted work, I consent to this being stored electronically and copied for assessment purposes, including the Departments use of plagiarism detection systems in order to check the integrity of assessed work. I agree to my dissertation being placed in the public domain, with my name explicitly included as the author of the work. Date: 2013-05-31 Signed:

Justin Dee

Implementing TopModel in Nova


Abstract
Environmental computational science uses a variety of platforms to create models upon which are based predictions about future behaviour. Modellers and scientists suffer from a lack of an easy, free, application with which to create complex models without resorting to prior knowledge of programming languages, which inhibits creativity and hinders efficiency. Nova is a platform created by Prof. R. Salter of Oberlin College to fill this niche, and in this project, Nova is analysed with regards its fitness for purpose in this role, by way of an implementation of the well-known hydrological model, TopModel. From this process, it is made clear that while Nova is a strong candidate to be considered when choosing a platform for a new computational model, there are ways in which it currently cannot replace existing methods.

Table of Contents
Chapter 1 Introduction and Objectives _______________________________________________________________________p3 Chapter 2 Background on Nova _______________________________________________________________________________ p5 Modelling Applications _____________________________________________________________________ p5 How does Nova work? ______________________________________________________________________ p5 The LotkaVolterra Model _________________________________________________________________ p5 Beyond LotkaVolterra: other components and functionality _________________________ p10 Chapter 3 Background on TopModel ________________________________________________________________________ p12 Chapter 4 Implementing TopModel _________________________________________________________________________ p14 Challenge 1: Understanding the source code ____________________________________________ p14 Challenge 2: How to begin tackling such a large problem ______________________________ p14 Challenge 3: Inputs and outputs __________________________________________________________ p15 Challenge 4: Flows _________________________________________________________________________ p15 Challenge 5: Inputting data sets __________________________________________________________ p16 Challenge 6: Obtaining meaningful output _______________________________________________ p17 Reflections on my method _________________________________________________________________ p17 Chapter 5 Conclusion _______________________________________________________________________________________ p18 Was Nova itself an inherently poor choice for the implementation of TopModel? ____ p18 Does Nova meet the criteria laid out above? _____________________________________________ p18 Future Developments ______________________________________________________________________ p19 Bibliography ____________________________________________________________________________________________________ p20 Acknowledgements _____________________________________________________________________________________________ p20

Justin Dee

Chapter 1 Introduction and Objectives


One of the most important tasks in the environmental sciences is modelling the environment. Through models, assumptions can be tested, and predictions can be made. The environmental sciences rely heavily on these predictions, assumptions, and models, because the environment is a complex and multi-faceted system; often, the only way to predict what effect some change will have, many months or years down the line, is to construct a model. For example, take rainfall; when it rains, water enters the ground, soaks through the earth, and makes its way to the seas via streams and rivers. One way flooding occurs is when too much of that water reaches a river at the same time; but how much water is that, exactly? How does this flood danger point change when it rains more, or less? What about if it only rains sparsely, but over a long period of time? All of these are questions that, while they can be answered in principle through measurements and demonstrations, thought experiments, need models to accurately answer for a given set of parameters. A model of rainfall dynamics can answer these questions, and provide local councils, flood protection companies, and insurance firms, for example, with the data they need to effectively plan for the future. This type of research, known as Computational Science, is a massive market that spans many disciplines, and how models are written or generated is an important part of the discussion surrounding them. Currently, in the environmental sciences, many models are implemented in programming languages like C, R, Java, Matlab, or Fortran. The reasons for this are several: Models are complicated. Often involving multiple iterations of calculations, programming languages provide the tools to perform these calculations with multiple variables. New models often reuse parts of existing models; Because older modules were written in lower-level programming languages like Fortran, new models which wish to refer to them have to either be written in the same language, or written in a language that the older code can be imported or converted into. Because much of the theoretical environmental science research is done in academia, rather than by corporations, the price of software must be considered. All of these programming languages are freely available, which is a significant advantage to be considered, compared to costly alternatives (covered in more detail below), which is a big draw when funding may be limited (for instance, for speculative research projects). Models are run on server machines. Because of the complex multidimensional calculations needed to run models, (for instance, with Monte Carlo simulations, where models with random factors are run many times to generate an idea of the result), heavy-duty computers are often needed to run these simulations in a reasonable length of time. These server computers, used to run the simulations, often only support programs written in these languages. And recently, a lot of work and research into running these sorts of models has focussed on the strength of parallel cloud computing, using many smaller machines to emulate one large one for the purposes of speeding up and optimising highly parallel computing like the Monte Carlo method. Being written in a format that allows this parallelisation is a big advantage to any model. Nevertheless, such a reliance on traditional programming languages for these models generates problems; not all Environmental Scientists are programmers, or come from a Computer Science background. Having to learn a programming language (or several) could be considered a hidden cost to the discipline, and one that delays and hinders projects. If a scientist wants to create a new model, he or she can do so in whatever language they feel most comfortable (or that which best fits the computational need of the model), but, if they want to change an existing one, they would need to be familiar with the language in which it is written, and this highlights another problem; models written in programming languages are not intuitive; it is hard to look at a block of code and understand intuitively what that represents, in terms of the model. Large amounts of commenting can make a computational model easy to read in practise, but it may still require one to read through the whole program in order to understand the role any one section has, or to gain an understanding of the structure of the model. The solution to these problems is higher-level languages, domain-specific tools that provide graphical or module-based interfaces where models can be created, changed, viewed, and executed. Graphical modelling tools should be easy to use, with little to no knowledge of formal programming languages, and should facilitate the use of computational models without having to worry about the method by which they are implemented. One such graphical modelling platform is Nova, created by Professor Richard M. Salter, PhD, from Oberlin College, as a scientific and educational tool for computational science. Its main feature is a graphical user interface,

Justin Dee

where components are dropped onto a canvas, and properties added, to formulate a model without the need to code anything at all beside the simple mathematical equations that comprise the functional parts of the model. The objective of my project is to analyse the Nova platform with regards its fitness for purpose, taking into account the following primary criteria: Usability In terms of basic operations, how easy is Nova to use? The goal of any such higher-level application should be to increase efficiency, and if it is harder to things in Nova than otherwise, then it wont be an improvement at all. Functionality Can you use Nova to do everything that is needed from such a platform? Computational models can be quite complex, and Nova will need to be able to duplicate every option a traditional language uses in order to be a viable replacement. Simplicity How much less complicated is it to use Nova instead of any other option? If Nova takes a long time and a lot of effort to learn, then there is no advantage to learning it than any of the programming languages already used for computational science. Additionally, I will evaluate if Nova fulfils some other important criteria that it must be compared against, criteria taken from the reasons I listed why programming languages are used at the moment as the norm: Cost; How cheap is Nova, compared to alternatives? Speed; Can Nova run simulations in a reasonable length of time? Cloud computing; Is it possible to adapt Nova to run in the cloud and take advantage of the benefits it offers? Novas modelling functionality is primarily aimed at the natural sciences, so, in order to analyse its use for the intended purpose, I shall convert several well-known computational models from these fields into Nova. The first of these shall be the LotkaVolterra model, serving as a simple model with few terms. This model (otherwise known as the PredatorPrey model) is a biological population model, but has broader applications. Additionally, it is used in several of the online Nova tutorials (created by Dr. Andy Lyons, PhD), and thus will serve as an introduction for me, to Nova and its tools. The second model I will implement is TopModel, a hydrographical model used to describe the way rainfall is absorbed by the ground. This is a much more complex model, and will serve to test the limits of Nova and its capabilities. It is a real, current model that is used in practical application today, something that I feel is important in providing relevant analysis. In Chapter 2, Background on Nova, I will explain a little of the history of Nova, and its main competitors, and I will discuss how it differs from them, and why it is important. I will also describe Nova, and the components and features I will use, through the example of the Lotka-Volterra model. In Chapter 3, Background on TopModel, I will describe TopModel, and how the it works. In Chapter 4, Implementing TopModel, I will describe the process by which I attempted to implement TopModel in Nova, and the challenges I faced in doing so. In Chapter 5, Conclusion, I will analyse any problems I had, and evaluate Nova with respect to the criteria listed above, discuss the methods I used for the implementation, and what effect hat had on the projects success, and consider any potential improvements, either to those methods, or the Nova platform in general.

Justin Dee

Chapter 2 Background on Nova


Modelling Applications
On Richard Salters mini-biography on the NovaModeller website, he explains that Nova was originally developed as a teaching tool because of the limitations of existing solutions. He elaborates on this in Nova: A modern platform for system dynamics, spatial, and agent-based modeling, explaining that the existing good solutions only support one specific view of a model, or are too broad, with little domain-specific support for dynamic models. The two main competitors in this field are both commercial products; Vensim, and STELLA: Vensim, produced by Ventana Systems, described itself as .. simulation software for improving the performance of real systems used for developing, analyzing, and packaging dynamic feedback models. STELLA, by ISEE Systems, aims itself at the education sector as well as the commercial; from their website: STELLA offers a practical way to dynamically visualize and communicate how complex systems and ideas really work. Whether they are first-time or experienced modelers, teachers, students, and researchers use STELLA to explore and answer endless questions. Both of these seem to offer similar functionality to Nova (primarily, a stock-and-flow graphical interface), in very polished packages, but, as seen above, are intended primarily for use in the government and commercial sector. STELLA does advertise itself for teachers and other educators, but the practicality of the situation is that at the time of writing, a single-person STELLA license costs $1899, and the professional version of Vensim, $1195. Admittedly, Vensim will provide a free educational evaluation license, but not with the full functionality of the larger versions (such as Monte Carlo simulation). In this way, Nova does have a niche to fill; its a domain-specific tool for educators and scientists, without the associated overhead of license costs.

How does Nova work?


Nova is a Java application, which uses an internal scripting language, NovaScript, based on JavaScript. It features two primary canvases, one for displaying the model components, such as stocks and flows, which will be described in further detail below, and one for displaying results, usually in the form of tables or graphs. Below is a screenshot of a blank project in Nova, with a key to the various UI elements overlaid:

This window contains global definitions and code to be used with the whole model. Above it are menus to add components. Above are the clock controls, adjusting the models running time and granularity. Above those are the execution commands, to run or stop the model.

This is the console, for manual output or (usually) error messages.

This list contains references to submodels, used to create more complex interactions. In this instance, Untitled is just the name of the blank project shown here.

This is the primary component display canvas. Terms, flows, and stocks are dropped here to create the model, along with other components.

This is the output display canvas. Tables, and graphs, appear here when they are created.

Figure 2.1

The LotkaVolterra Model


The Lotka-Volterra model of population density dependence was first proposed by Alfred J. Lotka in 1910. It takes the form of two equations, which describe the growth rates of two populations; conventionally called Predator and Prey, although the model is also applicable for other situations. The growth rate of each population (dx/dt and dy/dt) is dependent on the size of both populations (x and y), and a number of model parameters (here, , , , ) that define how the populations interact; and specifically relating to the mechanics of predation, and and defining how fast the populations would grow/decline in absence of the other.
Justin Dee 6

These equations naturally create a periodic solution, where the populations vary in harmonic motion; as the prey population begins growing, so does the predators, and as the predators grow more populous, the pretty begin to decline again. This decline gradually causes the predators to decline, which reduces the pressure on the prey population, which begins growing again. Figure 2.2 is an output from Nova showing this pattern.

Implementing the LotkaVolterra Model in Nova


I chose the LotkaVolterra model to implement initially, because I felt that as a simple model (only two equations, and two values), it would serve as an excellent introduction to the components of Nova, both to me, through the online tutorials (www.novamodeller.com) and in this project report. The first, and primary, component of Nova, around which most of the others revolve, is the Stock. This represents a value, a variable, a quantity which is expected to change through the evolution of the system. In this model (hereafter called the LV model), Stocks will be used to represent the two population quantities. Figure 2.3 shows some Stocks in Nova.
Figure 2.2

Figure 2.3

These green squares are dropped onto the drawing canvas from a menu. Right-clicking on one brings up a dialogue where the initial value can be set, and two important characteristics defined; there are checkboxes for discrete and non-negative. The latter does simply what it describes; it prevents the stock from having a negative value; in this case that makes perfect sense, so it is left checked. In other models, say, financial models, to give an example, it would be useful to allow a stock to enter negatives. Checking discrete means the value is treated as a sequence rather than a stock; the value of it is defined at every time step, but not in between (the evaluation of the flow is treated as a next value rather than a change in value; more on flows below). While this seems like it would make sense with this model: you cant, after all, have a fraction of an eagle or a fox. However, this model is units-independent: a value of 1 prey doesnt necessarily equate to only one fieldmouse, and, as mentioned above, the model can also apply to situations which are more intuitively continuous in nature rather than discrete, like economics. Additionally, the equations we have for LV are in the form dx/dt=f(x), which indicates a continuous function, rather than E(x)=f(x-1) which is the form one would need for a discrete function. So this checkbox is left unchecked. The next important component in Nova is the Term.

Figure 2.4

These components are used to store values (or expressions based on other parts of the model) for use in calculations. For this example, theyre just going to be used to store the parameters, PreyBirthCoefficient, and PredDeathCoefficient, that define how the populations grow or decline in absence of the other. As with Stocks, right-clicking them accesses their properties window where the expression indicating their value is defined. Lastly, we have the most important basic component of all; the Flow.

Justin Dee

Figure 2.5

Flows are how Stocks change. The cloud symbols can be dragged independently of the hourglass, which represents the flow itself. When dropped onto a stock, that side of the flow is connected to it. On the flows rightclick window, the expression detailing how much the stock should change is input, and additionally, there is a radio button choice between uniflow and biflow, as seen in Figure 2.6. Uniflow creates the blue arrows seen in 2.5, whereas biflow would add an additional arrow to the left, indicating that it would allow movement both ways through the flow. For this example, I am going to be tracking births and deaths with separate flows, so only uniflows are needed. If I was to use only one flow, then Id set it to Biflow, so that the value of the stock would correctly increase when the population change rate is positive, and decrease when negative.

In this pane, the expression for evaluating the magnitude of the flow is entered, as can be seen here. Since this will be the flow for determining how many prey are born, that part of the equation is entered here.

Figure 2.6

Both sides of the flow need not be connected to stocks; in the example, we dont need to track the populations dead members, nor do the new ones need to come from any sort of pre-existing pool, so we can just leave the clouds on one end of each flow hanging. Figure 2.7 shows the LV model after connecting the flows correctly. The red arrows indicated where the formula relies on a certain value; for instance, the birth rate of the predators depends on the death rate of the prey (in this model, prey only die when predated by a predator, and predators need to actually predate prey in order to breed, so their birth rate is dependant on their predation rate). Ive also added some additional terms that are needed; PreyBirthCoefficient and PredatorDeathCoefficient represent and from the original equations, but terms are also needed to represent and ; the predation rate coefficient and the predation to predator birth conversion coefficient, although Ive named them to be consistent with the scheme I already had here:

Justin Dee

Figure 2.7

So, simply by looking at this diagram, even without knowing the formulae behind each flow, it can be intuitively seen that PreyBirth depends on Prey and PreyBirthCoeff, PreyDeath depends on Prey, PreyDeathCoeff, and Predator, PredBirth is based on PreyDeath and PredBirthCoeff, and so on. The clouds are the open beginning and ends of the population flow, so its easy to see that births flow into the stock from nowhere, and deaths flow out, into nowhere. The only thing still missing is a way to examine this data, and for that, a graph is ideal. Wed like to compare the growths of both populations over time, and while we could use two graphs for that, in Nova, we can actually plot them both to the same graph:

Figure 2.8

The little square on the component canvas represents the graph there; the line to the two stocks is not drawn by the user; rather, on the properties window of that component, all the possible graphable values are listed, and here I have chosen both prey and predator from that list. We could also graph the values of the flows, or the terms (although that would be pointless here, as they are constant). Also on that properties window are various options to do with the scale of the graph, graph type, and so forth. The actual graph window is created automatically, in the results canvas, where is can be moved around. As the model has not actually been run yet, the graph is blank. Tables, another Nova component, work very similarly to graphs, but presenting the data as raw tabulated results instead. As shown in Figure 2.1, above, and Figure 2.9, below, the controls for running the model are above the canvasses:

Justin Dee

Figure 2.9

The Start and End input boxes define how long the model should run for, and Dt defines the granularity. Method can be changed to a couple of different integration methods, presumably useful for some models. To run the model, first it must be captured, with the Capture button. This converts all of the components on the canvas into NovaScript form (which can be viewed with the button). NovaScript is the internal script representation of the model, as seen in Figure 2.10, which shows the part of the NovaScript with the model definitions in it, including the values for the terms I entered, and all the flow equations. There is more in that window, but a lot of the remaining NovaScript relates to the display of all the components, their position on the canvas, for example, which I feel it unnecessary to showcase and explain, as it is not important for the models functionality. NovaScript is actually a domain-specific form of JavaScript, which allows for the use of JavaScript maths functions, more complex formulae including if statements (particularly their inline variant), any JavaScript library, and so forth. Having this underlying tool will be very useful later on when more complex goals arise. After this data has been captured into NovaScript, the next thing to do is Load it, either from the NovaScript window or the main one. This loads and compiles the NovaScript code, and it important for checking any syntax errors one may have entered into the formulae. Finally, the model must be Executed, either via the Exec buttons, which run the whole model from start ot end, or, the Init and Step buttons, the former of which begins the model without running it, the latter of which advances the model one time step. The Stop button can be used to cease calculation mid-simulation, very useful if one has misjudged the length of time a simulation will take to complete and set an unnecessarily long simulation time. The results of this LV model execution can be seen in Figure 2.9 above; the two lines, red and blue, on the graph, indicate the development of the populations over time. Although at first glance it may seem that they achieve a similar maximum population, note that Nova plots them on their own vertical scale, so as to retain the highest resolution view of each graph. It is clear to see, though, the periodic motion expected from the equations that we started with, and the harmonic relationship between the two populations.

Beyond LotkaVolterra: other components and functionality


This model is relatively simple, and so uses little of the extended functionality of Nova. Some of that I will explore in Chapter 4, as I attempt to implement a much more complex model, but I will outline some more components here briefly: Commands are components that solely contain code to be executed each time step. Sliders and Spinners are very much like Terms which cant hold expressions, only values, with the addition that the representation of them in the component canvas includes a graphical slider or spinner which can be used to adjust the value without entering the properties pane. Labels are just that; labels. Used to provide annotations to diagrams.
Justin Dee 10

Figure 2.10

Chips and pins are components used within the context of submodels, something that seems very important to Novas power. Chips act as either outputs or inputs, and if a model has at least one output, then it can be used as a submodel within a larger supermodel, as a chip, a black box model that can be hooked up via the pins to components in the supermodel. Agent containers; these are beyond the scope of this project, but in essence, allow for models where individual submodels (agents) act with individual scope relative to other submodels; for instance, simulating an actual population of animals with some element of randomised activity. Batch processing. Each term can be configured as a batch element, which allows its value to iterate through a range of values over a simulation. Code chips (confusingly named; unrelated to chips and pins) which contain raw code to be executed. Integration with the statistical programming language R

Justin Dee

11

Chapter 3 Background on TopModel


My primary source on TopModel is A dynamic TOPMODEL (K. Beven & J. Freer, Hydrological Processes 2001), although somewhat over my head as a non-environmental-scientist. I shall here attempt to describe TopModel as best I can. All diagrams are courtesy of Prof. Gordon S. Blair and Dr. Yehia El-Khatib (A Cloud-based Virtual Observatory for Environmental Science, OpenWater symposium 2011/4/19, and, Building a Cloud Infrastructure for a Virtual Environmental Observatory, The American Geophysical Union (AGU) Fall Meeting, 2012/12). TopModel describes the way that rainfall falls onto an area of terrain, and, based on the surface of that terrain, including its inclination, and the saturation of the ground, how that rain either permeates to the water table, or flows through the ground, or over it, to the channel or basin.

Figure 3.1

At the beginning of a simulation, the ground is dry, the water table low.

Figure 4.2

In Figure 4.2, the rain begins to fall, permeating the so-called root zone. This begins to saturate the soil.

Justin Dee

12

Figure 4.3

Figure 4.4

Figures 4.3 and 4.4 show how the water permeates through to the water table, causing it to rise, and in the shallows terrain, rise above the terrain.

Figure 4.5

Figure 4.6

Figures 4.5 and 4.6 show the overland flow, streamlets forming, but then ceasing as the saturated ground begins to transfer through to the channel, and the water table recedes.

Figure 4.7

Figure 4.8

Eventually, as the rain stops, the root zone begins to dry out again as the saturation begins to return to normal. TopModel was first published in 1979 by Beven & Kirkby (A physically based, variable contributing area model of basin hydrology), has had a profound impact on hydrographics, particularly rainfall, runoff, and the way they are modelled and studied, and is still in use today.

Justin Dee

13

Chapter 4 Implementing TopModel


Challenge 1: Understanding the source code
I began this project without a great deal of understanding about how TopModel works, how it is structured, what the various elements of it were; not being an actual environmental scientist, I could not approach this aspect of the project as an actual user could. Nevertheless, what I did have access to, was the source code for a C implementation of TopModel (available at https://source.ggy.bris.ac.uk/wiki/Topmodel). My hypothesis is that a translation from C to Nova, while not emulating the actual implementation process a target scientist would use, would nevertheless expose me to many of the same challenges and tests of the system as they would. I am, after all, attempting to implement a real hydrological model, the same goal, and so the end result shouldnt differ too greatly. I will take into account the fact that I am approaching this from a different initial standpoint than the target audience in my conclusions. I began by reading through the source code of the main methods (in topmodel.c and core_topmodel.c). I noted that there are a lot of parameters to TopModel (an overview of them can be found on https://source.ggy.bris.ac.uk/wiki/Running_Topmodel). I also noted that a large portion of the calculation was done in a loop over an array of area index classes (nidxclass). Realising this, I decided that this was an ideal candidate for the submodel functionality of TopModel; I could design a submodel with everything inside that loop in, and run many copies of that submodel to simulate the loop, with input pins to adjust the behaviour for each iteration. I also noted that there was a fairly large function get_f(), which seems to simple be a mathematical calculation (albeit based on many inputs), which could easily be turned into a submodel (or simply a code chip). I noted that on top of the large number of parameters, TopModel also requires a large dataset to operate on (the rainfall measurements, particularly). This I obtained, along with suggested values for the parameters, from one of my project supervisors, Dr. Yehia El-Khatib, who had worked with TopModel before.

Challenge 2: How to begin tackling such a large problem


At this point, I did some research into how to deal with models that had so much going on in them as TopModel did, as I was somewhat intimidated by TopModels complexity, and could not see any way to beginning an implementation. A breakthrough in my understanding came when I read through an example model on the Nova website; a model of a drunken random walk (http://www.novamodeler.com/model-library/drunken-randomwalk/). Although simple, the way this model used stocks to represent, rather than quantities, variables (in this case, components of a velocity vector), helped me grasp my way towards an understanding of how I might begin to tackle this problem. I had already created Terms to represent all the models parameters, since that was a comparatively simple task. Understanding that I could represent all the variables used in the C methods with Stocks, I decided to do just that, and created a whole screens worth of Stocks, matching the variable names to those within the C code; note the use of a Nova label component to reference descriptions for all the parameters:

Figure 4.1

Justin Dee

14

For the large part, this was an easy task, being as the code was in C, and well-coded, all important variables were defined in a structure, rather than inside the code.

Challenge 3: Inputs and outputs


I already had a large amount of data to be used as input, in the form of a .csv file with five columns, two of which, date and time, I knew could be ignored for the purposes of this here; Nova is time-unit agnostic; all I needed to do was make sure that the value for dt (in the Term) was correct with reference to what I set 1 timestep to be (in this case, 15 minutes, since that was the granularity of the data I had). The other three columns were the important input data: Flow, Rain, and PET. Although I was not aware of what these were, I knew they would be be important in the model, so I created terms to represent them (note: these could have been pins since I was already looking at this module being a submodel; handily, Nova includes a menu function to convert Terms to Pins). Already I was foreseeing a problem; how was I going to convert data from a CSV file into Nova? I set that aside for now, and focussed on the output; it looked like (from output.c) there were five output variables measured, so I attached graph components to those:

Figure 4.2

Challenge 4: Flows
I knew that every variable was important to the calculation in some way; the next challenge was to pore through the code and discover what each variables interaction with the input, and each other, was. I began in a methodical manner; reading through the C code top to bottom, focussing on each expression one by one and implementing it as a flow in the diagram. I quickly came to the conclusion that doing it in this manner only made sense using Novas discrete mode; these stocks should all be sequences instead, as the way the C code was implemented was as a grand loop every time step; where a discrete operation on each variable occurred each iteration, rather than an expression describing the relationship between time and change, as was demonstrated in the LotkaVolterra model. So as I went, I changed the type of each Stock to a sequence (which changes it to purple, as can be seen below. I also added a placeholder for the get_f() nested function, which I reasoned could be implemented later.

Justin Dee

15

Figure 4.3

At this point, the model is beginning to take shape. I have all of the dependencies implemented for some of the output variables, like fex, for example. Here, I felt the time was right to begin testing the model, to see what sort of output it gave. I wasnt expecting anything near correct, as I had not yet finished the method I had started of implementing each expression one by one. Nevertheless, this is where I butted heads with the input problem again, and it had me well and truly stumped. I had a CSV file with over 3000 records, but there seemed no way to import this into Nova, no version of a Term that allowed one to reference a file, or even have multiple values. Based on what I already knew, the only solution I could come up with was to manually create 3000 terms, for each input column, which would have taken an excessive amount of time and screen space; clearly not feasible. I felt at this stage that I had discovered a major flaw with Nova.

Challenge 5: Inputting data sets


After some thought, and research, and meeting with my supervisors, I hypothesised three paths of action: One, to find some way of importing the CSV file through NovaScript, since I expected JavaScript to have IO capabilities. Two, to programmatically convert the CSV file in some manner to a NovaScript array and paste it into the Lambda view of the model. Three, to seek the Plugin API for Nova, which is referenced in its documentation, and write a plugin that would allow me to read in a file and access it through a special component. The Nova application itself is written in Java, a language I would have been confident writing such a plugin in. However, before going ahead with any of these possibilities, I emailed the developers (via http://www.novamodeler.com/contact/), detailing my problem. I was happy to receive a rapid response from Dr. Andy Lyons, who said hed pass it on to their programming team. The next day, a new stable build of Nova was published (http://www.novamodeler.com/blog-release4/), the release notes of which hinted at a new data type called a Run, which sounded like it could totally solve my problem. It was again not long until I heard from the Nova team, this time Professor Richard Salter, who detailed a general solution to my problem using the new functionality added in this release. It required splitting my CSV file into three separate files, but beyond that was not complicated:

Justin Dee

16

Figure 4.4

As shown above, the Term rain_i that I had created as a placeholder was changed to instead refer to the index of raindata equal to the current time step. Raindata itself was created in the program window, simply by creating a variable definition and invoking the new newRunData() function, which does all of the background work of loading the values from the comma-separated file.

Challenge 6: Obtaining meaningful output


With my data input functioning, and most of the variable calculations attributed, I was beginning to worry that my model refused to give any meaningful output; all output graphs consistently showed values of zero across the whole period. I spent a long time trying to hunt down bugs in my code, areas where I though problems could be, but increasingly, I found that with so many variables and interrelations, it was very hard to keep an overview of what everything was doing. This was hampered by a poor understanding of the models structure at the beginning of my implementation.

Reflections on my method
Simply put, I do not think I approached the implementation of TopModel from the correct perspective. Partially, this is due to my coming at it from a coding angle, rather than a modelling one, for instance, attempting to relay every single expression in the C code to an expression in the Nova. Had I the time now, Id start over with a fresh approach. Its clear, I think, that many of the variables represent a quantity of water in some way, and the expressions and flows are indications of water changing state as the models develops. If I had a clearer view of exactly what is happening here, I feel I could much more easily represent this with the system of Stocks and Flows Nova provides, rather than treats the stocks as variables to which the same expressions could be applied as they were in the original C. My hypothesis in the beginning of this implementation chapter was that implementing it from a code-first perspective would result in essentially the same model as an idea-first system, whereas I think this tangled mess I have ended up with shows that hypothesis is demonstrably false. On the other hand, I do believe the problem I had importing data into the model was a sincere one, backed up the developers need to release new functionality in order to accommodate it! So in some ways, I do think I succeeded in challenging Nova to some of the real problems it would face being used in a practical environmental science project.

Justin Dee

17

Chapter 5 Conclusion
Was Nova itself an inherently poor choice for the implementation of TopModel?
Without a successful implementation of TopModel to demonstrate, the question must be posed; if Nova cannot be used to implement a well-known and important model, is it inherently a failure as a platform for modelling in general? The answer to this is no. I feel that my failure to get results from my TopModel implementation must still be due to an error in my code, or a section I incorrectly implemented somewhere. Had I the opportunity to begin afresh, or, if I was instead approaching it as someone who understood the underlying assumptions and structure of the model, I do believe Nova would have been able to provide a suitable environment for TopModel. And as such, I do feel that Nova can be regarded as a strong candidate for a choice of modelling platform.

Does Nova meet the criteria laid out above?


In Chapter 1, I outlined three primary criteria against which I would analyse Nova, and three secondary criteria which are important for the domain-specific use of Nova, that being, for educational and research use in the environmental sciences. I shall repeat them again here: Usability. Functionality. Simplicity. Cost. Speed. Adaptability for Cloud networks. Through the process of implementing TopModel, I feel I can comment well on all of these criteria; in that respect, the project has been a great success. First I will address the simple secondary criteria, the important primary conclusions. Cost Nova is free, for educational and research uses. As shown in chapter 2, this is significantly less than its major graphical competitors, and no more expensive than more technically difficult solutions like C or R. Speed I generally found Nova to be fast an efficient. With the Lotka-Volterra model, it could render a hundred time steps fast enough that you could use the slider functionality to see changes in results almost live, and with TopModel, it only took around ten seconds to process over three thousand records. Maybe with more complex relationships this would slow to unfeasible times, but I do not think that it is any slower than a native code based solution. Cloud access While Nova has no native cloud functionality, it is itself a Java application, and one that parses a script to create a runnable module. I am sure that with the right Java middleware a Nova interpreter could be created that takes advantage of cloud parallel processing. Usability On a basic level, Nova is very easy to use. The ability to drop components down and connect them to create simple relationships is intuitive, and the visual presentation of the model leads itself well to a modelling platform. Moreover, as I found while experimenting with the LotkaVolterra model, it is a big advantage to be able to change parameters and instantly see them update on a graph. Very little of the core functionality is obscured, and with only a selection of very simple components, complex systems, like the framework I implemented for TopModel, can be quickly created. The Stock and Flow representation of states is not unique to Nova (it is also featured as part of the primary competitors I mentioned in chapter 2, but I find it an excellent choice for a modelling platform like this. Additionally, the simple method of converting terms to pins in order to drop one model into another as a submodel is ingenious, and opens up many solutions that would otherwise be unfeasible in this platform; like, for example, my initial plan for the final stages of my TopModel implementation; would it be that I had to copy and paste the whole submodel I was creating for every index class, I would not have looked at Nova at all.

Justin Dee

18

Functionality Here Nova had some problems; notably, the inability to use a large data set as input values. The fact that the developers released a patch supporting this functionality soon after I contacted them about it must be considered a massive indication in their favour, and I am very grateful for the work they did. This also demonstrates that Nova is a tool still under constant development, which means that any further functionality which is missing, might possibly be upcoming. In practical terms, it wasnt the lack of functionality which prevented me gaining any results from my model, but rather, an incorrect approach to the creation of the model which meant that maintaining it was very hard. Simplicity Unfortunately, this is the big point under which Nova fails, for me. I felt that, while simple operations and simple models were just so easy to create, intuitively going beyond a certain point was very hard, semantically. Much can be done with NovaScript to create interesting and exciting functionality within models, but in essence it does still require knowledge of a programming language to do so (in this case, JavaScript). The solutions for the problem I face regards data input involved some JavaScript. Doing much beyond basic maths inside expressions required knowledge (or access to documentation for) JavaScript functions. Now, which these are all readily available, Im not sure it really comes across as an advantage over just using a pure language; that being the idea behind replacement platforms like Nova, to replace more general solutions. In conclusion, it did not take me very long to pick up the basics of Nova, but the grander scale of TopModel and the grasp over the platform I needed to have a better chance at a successful implementation of it required, I think, a longer learning curve. Looking at some of the more complex models available for Nova, I can see that this extends even further; in essence, while Nova has a very simple front-end functionality (ideal for educational uses), a lot of its power that makes it into a research-level tool comes from the powerful back-end, which unfortunately, I cant see as being much easier than learning an existing domain-specific language like R. That said, there are still some major advantages it has to offer; it is simpler to view models in Nova. Even with the result I obtained with TopModel, I could, simply by looking at it, gain a more intuitive understanding of what was going on than with the C code. This was also true with the LotkaVolterra model; two equations which might not mean very much to someone without an understanding of calculus, are represented as two very simple flows of population, flowing in, then flowing out, with clear arrows to indicate dependencies. Additionally, I feel it is a lot easier to change a Nova model, to take it and deconstruct it. The powerful submodels feature means chips can effectively act as black boxes, where the contents can be changed independently of the main model. Comparing this to the C code for TopModel, where even the individual methods like get_f() referred to globally stored variables, and changing out any one section would have required extensive examination of the consequences through the whole code. These strengths lead me to recommend Nova as a great tool for implementing a model in from scratch. My experience with TopModel has taught me a lot about understanding where youre coming from before you can do something with a model, and Id definitely NOT recommend using any graphical language for a conversion of procedural code.

Future Developments
Overall, I feel that Nova is a strong candidate for use within the computational sciences. However, I can only conclude that as a tool still under development, too much of the functionality is hidden behind NovaScript, which, while strong, has no definite advantages over existing solutions aside from existing inside the Nova framework. I was impressed to be presented with a solution to missing functionality so soon after discovering it, and can only hope that the Nova developers continue to bring that functionality forward to the intuitive graphical interface that is Novas forte. I began this report by discussing the problem of environmental scientists having to learn procedural languages in order to use models with their work. I feel I am ending it here with the understanding that, moving forward, if Nova continues to garner interest and support (like, for example, with a recent Google Tech Talk on it: http://www.novamodeler.com/pres/20130510_google-tech-talk/), then while it may not obviate entirely the need for new environmental scientists to learn a more formal language eventually (primarily because so much existing content is there in R, C, Matlab, and so on), Nova is a perfect intermediate tool for scientists wanting a primary platform to work with on their own new projects.

Justin Dee

19

Bibliography
http://www.novamodeler.com/ including: Nova: An Interactive Graphics-Scripting Platform for Education and Computational Research by W. Getz, from Nova, A Google Tech Talk, Mountain View, CA,May 10, 2013 Nova: A modern platform for system dynamics, spatial, and agent-based modelling by Richard M. Salter, International Conference on Computational Science, ICCS 2013 http://cran.r-project.org/web/packages/topmodel/topmodel.pdf A dynamic TOPMODEL, K. Beven & J. Freer, Hydrological Processes 2001 A physically based, variable contributing area model of basin hydrology, K. Beven & J. Kirkby, Hydrological Sciences 1979 A Cloud-based Virtual Observatory for Environmental Science, by Gordon S Blair & Yehia El-khatib, OpenWater symposium 2011/4/19 Building a Cloud Infrastructure for a Virtual Environmental Observatory, by Yehia Elkhatib et al, The American Geophysical Union (AGU) Fall Meeting, 2012/12

Acknowledgements
Many thanks to my project supervisors, Prof. Gordon Blair and Dr. Yehia El-Khatib of Lancaster University, and also to Prof. Richard Salter and Dr. Andy Lyons for their assistance.

Working Documents can be found at http://www.lancaster.ac.uk/ug/deej1/fyp2/

Justin Dee

20