Вы находитесь на странице: 1из 5

Models and formalisms for Systems Biology

Mehran Sharghi February 2009

Introduction
Classic biological methodology is a reductionism approach where to understand a biological system single components are considered and studied. On the other hand system biology approach considers the whole system and tries to build an abstract of the system that model its behaviour using all available information. This has been a challenging approach since the enormous complexity of biological systems. However with the popularity of high throughput biological assays and consequently the fast growing amount of genomic, transcriptomic, proteomic, metabolomic, and interactomic data it would not be appropriate and possible to only use a reductionism approach. Moreover to fully understand a biological system it is necessary to consider interactions between components in a spacio-temporal framework. Organisms and biological processes are far too complicated that can be abstracted by a single model or using a single method. Although it is against the ultimate goal of a system biology approach however usually a specific process or a small subsystem of a biological system, such as gene regulatory networks, metabolic networks, and protein-protein interactions are modelled. There is not a single method for modelling a biological system. Depending on the specific biological process and specific aims of the model different approaches have been used in different studies. Various mathematical concepts and formalisms have also been used for representation and analysis of biological models. In this essay I will briefly review some of the more widely used modelling approaches and formalisms and techniques to represent and analyse these models.

Models
Models and modelling process has been categorised from different viewpoints. Schlistt et al [1] used four levels (layers) to categorise biological models. These are part lists, topology, control logic, and dynamic models. Each level adds more details to the previous level. Part lists consists of a set of model elements (building blocks of the model) for a particular biological process or system. Topology model represents the structure of the model in form of a connection diagram. Control logic model is the description of the regulatory mechanisms and signals to activate or repress any element(s) in the model. Finally Dynamic model is the simulation of the behaviour of the system. Any of these categories can be used to model the same biological concept. For example a gene regulation can be modelled at topology level or it can be modelled using a dynamic model. Joyce et al [2] looked at the modelling process from a different viewpoint but suggested similar modelling concepts. They considered various kinds of omics data as the primary knowledge about a biological system. Then they discuss various approaches of integrating these data and building a model. First approach is identifying the scaffold in which the cellular network (structure of the model) is identified. The second approach is scaffold decomposition. This is to identify related components or modules in a network in order to understand the overall network structure. The last approach is cellular modelling and analysis which is modelling of the dynamics of the network. Several biological processes have been specifically considered for developing models. These include gene regulation networks, metabolic networks, protein-protein interaction, drug and therapy, development, signalling, etc. These biological systems have been modelled at different levels from structural to dynamic models. One of the most active research are in system biology has been gene regulatory networks. A gene regulatory network (GRN) is a collection of biological molecules and substances and interactions determining gene expressions and gene product abundances in a cell [3]. Gene expression and transcription are regulated by proteins and

metabolites, on the other hand proteins are product of genes. External environment of a cell and products diffused from other cells might also participate in a GRN. Modelling and reconstruction of GRNs has been a core objective in functional genomic and system biology. Karlebach et al [4] reviewed various modelling techniques used to abstract and reconstruct GRNs. Some of the most popular methods are described later in this essay. Biological processes are not independent suggesting that integration different process in would results in a more accurate model. For example many elements of metabolic networks are regulated by gene products or environmental conditions. Integration of metabolic networks and regulatory networks would generate a more powerful model to predict behaviour of the system. In an attempt to produce a comprehensive model for a biological system as whole, Noble [5] exploit a multilevel integrated model philosophy. He suggested models at sub-cellular, cellular, tissue and organ levels. He specifically worked on modelling heart at all these levels and linked the models together to devolve a comprehensive model of heart. Although integrated and multilevel models are more accurate and closer to the actual biological system however they require more computational resources that might grow exponentially as the size of the model grows. It is almost impossible to develop an accurate model from a set of data at first place. The natural process is to improve the model accuracy in a cycle of model modification and performing biological experiment to test the model. This refers to the value of biological knowledge for producing more accurate models. Tanaya et al [6] used a large compendium of previously published results from heterogeneous experimental techniques to study regulatory network of s-cerevisiae. Hanisch et al [7] also used prior knowledge in studying regulatory and metabolic networks.

Formalisms
Several formalisms and mathematical concepts have been used to represent and analyse biological models at different levels. This section briefly reviews some of the most popular techniques and examples of application in modelling biological processes.

Graphs
Graphs are perhaps the most used technique in representing biological models. Graphs are used to represent structure and dependencies in models. Graphs provide simple visual understanding of the underlying system. Graph analysis technique can provide important information about various structural aspects of the system. Additionally graphs can be used to identify potential modules and related components in the system. Barabasi et al [8] published an excellent review of modelling using graphs and related graph analysis.

Booleannetworks
A boolean network is a directed graph and a set of boolean functions. Nodes in the graph have a boolean value of 0 or 1 and edges define the relation ship between nodes. The state of the model is determined by the value of all nodes at a time point. There is a boolean function associated with each node that defines its value at the next time point based on its current value and value of all nodes that point to this node. Initial values of nodes defines the model state at the next an all the future steps. A chain of states that the model goes through in a sequence of time points is called a trajectory. An attractor is a single steady state, (or a cycle) at the end of a trajectory where there is no possible transition out of it usually corresponds to steady state of the system. Boolean networks has extensively used in modelling gene regulatory networks. In such a models nodes represent various entities in a GRN such as genes, proteins, mRNA, etc. The values should be discretised to 0 or 1 based on some property such as expression, abundance, etc. If an entity is regulated by other entities then there will be a directed link between the corresponding nodes. A set of boolean function (regulatory functions) defines the state of each node based on its current state

and the state of all its regulators. Boolean networks are simple and efficient way of modelling a GRN, however there are some simplifications and limitations associated with them. Having only two states for each entity means loss of any valuable intermediate data. Another limitation is the discrete nature of this model in which state of the system is available at discrete time point. Another important issue is the requirement of exact regulatory functions which is not always possible to determine from the available experimental data.

Randombooleannetworks
Random boolean network also called stochastic or probabilistic Boolean networks is an extension to boolean network and attempt to address one of the issues with boolean networks. [9] When using boolean network to model a biological process often there are multiple boolean functions that are possible (with different probabilities). For example in a GRN expression of gene A might regulate expression of gene B with probability p1 and expression of gene C with probability p2. When calculating state of a node for the next time point one of the possible functions is selected based on its probabilities. This results in a stochastic model that can lead to many trajectories of different probabilities from an initial state.

Bayesiannetwork
A Bayesian network is a directed acyclic graph that encodes probabilistic relationships among variables of interest. Bayesian networks are suitable in modelling processes where causal relationships are stochastic and noisy or when there are missing data. This is exactly the case for most of the data obtained from high throughput biological assays such as microarray data. Another advantage of bayesian networks is the learning ability that can be used to gain understanding about a problem domain and to predict consequences of interventions. Bayesian networks were originally use by Firedman et al [10] to mode GRNs. They used transcriptome data obtained from multiple microarray experiments to establish dependencies and perform learning phase for their bayesian network. One limitation of bayesian networks is the acyclic requirement of the graph that represents the model. However in most biological processes there are feedback loops where an entity in a pathway might effects production of another entity in the previous steps of the same pathway. Dynamic bayesian networks that are an extended version of bayesian networks are proposed to address this issue. This is achieved by adding a time to the model and adding dependencies between variables in consecutive time points. Although bayesian networks are well matched some biological processes however computational complexity can be a barrier in using them for large scale models.

Ordinarydifferentialequations(ODEs)
Ordinary differential equations have been used widely to model continues variations and dynamic behaviour of a system in a mathematical approach. A popular example is using ODEs to model metabolic networks. Normally the process start with a structural model represented as a graph of dependency and interactions. Differential equations are used to describe rate of concentration change for each entity in the model over time. These rates are represented by kinetic rate laws that include some rate constants. The set of these constants is called parameter set of the model. Parameter values are found from experimental data and can be modified to fine tune the model. The actual model is a set of ODEs that can be solved using numerical methods. ODEs have also been used to model GRNs in general and in specific organisms [11, 12]. Models based on ODEs can provide detailed information about the dynamic changes in the underlying system. However accuracy depends on the accuracy of calculated parameters. It should also be considered that large number of ODEs requires significant computational resources and also large number of parameters result in an accumulated inaccuracy. This technique is more suitable for

smaller models where there is only small number of ODEs.

Stoichiometricmatrix
Stoichiometric matrices have widely been used to represent metabolic networks. In a stoichiometric matrix rows represent reactions and columns represent metabolites. Entries with non-zero values represent stoichiometric coefficient of the metabolite in the reaction. Positive values correspond to products and negative values correspond to substrates. Stoichiometric matrices are analysed using various techniques including flux mode analysis, extreme pathway analysis, etc [13] Metabolic and reaction graphs are two other ways of representing a metabolic network. In the former nodes are metabolites and links are reactions and in the later node are reactions and links are metabolites that are product and substrate of the two reactions connected by the link. In comparison with stoichiometric matrices, metabolic graphs are simpler with possible loss of some information. However they are useful for analysing large-scale metabolic networks efficiently using various graph analysis technique.

PetriNets
Petri Nets have been used for modelling concurrent processes in computer science and other disciplines. It has recently been adapted in system biology to represent dynamic models. A simple petri-net is a directed graph that consists two types of nodes called place and transition nodes. Edges represent input and output places associated with each transition. Edges can be weighted which indicates units that should pass from each place to a transition. Places can also be marked with a number indicating units accumulated in that place. A transaction is enabled and may be fired if all its input have number of units no less than the transaction weight. The state of the system is represented by a vector containing marks for all places. Petri-nets are well matched and have been used for representing and analysing metabolic networks. Reactions are usually represented by transitions and places used to represent chemical compounds, and edges weight represent stoichiometric information. Modifications and variation for petri-nets are also suggested to address specific modelling requirements [14]. Petri nets can be analysed using mathematical theorise behind it. This analysis can provide useful and novel information about the biological process being modelled. However an issue with this method is the exponential rate of computational complexity with the size of the model. Petri-nets have also been used to model dynamics of gene regulatory networks and signalling pathways [15].

Discussionandconclusion
Different approaches have been suggested to model a biological system at different levels. Models help in understanding the underlying biological system and allow predictions to be made about its behaviour. Despite some effort in modelling a whole biological system, lack of comprehensive data, stochastic nature of biological reactions, and complexity are some of the issues for developing such models. Instead researchers have been focusing on specific biological processes or partial systems. Accuracy of biological models depends on several factors. One important factor is the quality of data used to build the model. Models are built using data from various sources especially highthroughput methods such as microarray experiments. It should be noted that these data contain noise, missing values, and might be biased. One approach to minimise the effect of such issues with data is to use data from multiple sources and of different kinds. Another important factor is completeness of the model. Biological processes are dependent in each other. Studying a selection of components involved in a system results in an inaccurate model. This could be because of partial data or because of simplification that made to overcome computational complexity. In addition stochastic nature of biological interactions makes it more difficult to construct very accurate models. Models are therefore refined in an iterative manner by assessing them against experimental

results and applying modifications. This process repeats until result from model and experiments are satisfactorily close to each other. When making biological conclusion from a model extra care should be taken with regard to model partiality, quality of data that used to build the model, bias of the data and accuracy of the model. Finally I would like to refer to the use of standards and repositories for biological models. Sharing biological models, establishing model repositories, using and modifying these models by various groups requires use of a common standards and ontologies to represent models. CellML and System Biology Markup Language are two suggested formats for representing biological models.

References
[1] Thomas Schlitt, Alvis Brazma, Modelling gene networks at different organisational levels, FEBS Letters, Volume 579, Issue 8, Systems Biology, 21 March 2005, Pages 1859-186 [2] Joyce A R, Bernhard P, The model organisim as a syetm: integrating omics dta sets, Molecular biology, 2006, 7, 198-210. [3] Bezerianos A, Maraziotis I A, Computational models reconstruct gene regulatory networks, Molecular BioSysemst, 2008, 4, 993 - 1000, DOI: 10.1039/b800446n [4] Karlebach G, Shamir R, Modelling and analysis of gene regulatory networks, Nature Reviwe Mollecular Cell Biology, 2008, 9(10), 770-780. [5] Szallasi Z, Stelling J, Periwal V, System Modeling in Cellular Biology, 297-312 [6] Tanay A, Steinfeld I, Kupiec m, Shamir r, Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium, Molecular Systems Biology, 1, 2005 [7] Hanisch D, Zien A, Zimmer R, Lengauer T, Co-clustering of biological networks and gene expression data, Bioinformatics, 2002;18 Suppl 1 [8] Barabsi AL, Oltva ZN, Network biology: understanding the cells functional organization, Nature Reviews Genetics 5, 101-113, 2004 [9] Kauffman S, Peterson C, Samuelsson B, Troein C, Random boolean network models and the yeast transcriptional network, PNAS, 2003, 100(25), 14796-14799 [10] Friedman N, Linial M, Nachman I, Pe'er D, Using bayesian networks to analyze expression data, J. Computational Biology, 7(3):601-620, Nov 1998. [11] Sakamoto E, Inferring a system of differential equations for a gene regulatory network by using genetic programming, Proceedings of Congress on Evolutionary Computation, 2001, 720-726, IEEE Pres [12] Chen K C, Integrative analysis of cell cycle control in budding yest, Molecular Cell Biology, 15, 3841-3862, 2004. [13] Reed J, Famili I, Thiele I, Palsson B O, Towards multidimensional genome annotation, Nature Revies Genetics, 7, 2006. [14] Pinney J W, Westhead D R, McConkey G A, Petri Net representations in systems biology, Biochemical Society Transactions, 2003, 31, 15131515. [15] Peterson J L, Petri Net Theory and the Modeling of Systems, Prentice Hal, 1981

[16] CellML, wbsite http://www.cellml.org/, last visted February 2009. [17] SBML, http://sbml.org/Main_Page, last visted February 2009.

Вам также может понравиться