Modeling of Gene Regulatory Networks Using State Space Models

Modeling of gene regulatory networks is becoming popular to understand the complex molecular mechanism of gene regulation due to the availability of high throughput genomic data. Such techniques have encouraged the researchers to understand not only the structure of gene regulatory networks and proteomicnetworks but also gene-gene associations or interactions. State space models are a relatively new approach to infer gene regulatory networks. It has the unique feature of capturing the dynamicity of the gene regulation which is inherent to the biological networks as well as computationally efficiency. The nonlinear state space models also considered the non-linear associations between genes, which all linear modeling approaches fails to capture. Performance evaluation criteria for the approaches used for modeling genetic regulatory networks are also discussed.


Introduction
Computational Genomics is now becoming the growing area for researchers to decipher biology from genome sequences and related high throughput data. In the post genomic era, there is huge amount of genomic data available because of different advanced experimental technology like microarray technology, Chromatin immune-precipitation with array hybridization (ChIP-chip) etc. [1]. In order to analyze and getting informative knowledge from these data, the efficient statistical approaches are required. These methods are very useful in knowing the interactions among different genes through their by-products, which ultimately regulate the expression of thousand genes in a living cell. Of all the available datasets, gene expression data is the most widely used for gene regulatory network inference. Regulation of gene expression is primarily mediated by regulatory proteins called Transcription Factors (TF). These DNA binding proteins bind to the promoter regions at the start of the genes and thereby controlling the transcription of the genes which ultimately regulates the expression of that gene [2] More specifically, TFs regulate the initiation of transcription through different strategies operating on the transcription mechanism. The interactions between the genes give rise to a complex network like structure known as Gene Regulatory Networks(GRN). The understanding of the nature of information on genes and their regulators is improved by the use of network theory, which permits us to uncover some patterns present in gene regulation [3]. Formally, GRNs are modelled as directed graphs which are composed of vertices or nodes as bio-molecules (e.g. genes) and directed edges (connection between genes) represent the regulatory interaction (activatory or inhibitory type).
There are two types of gene expression experiments. In static gene expression experiments, a snapshot of the expression of genes in different samples is measured. While in time series expression experiments, cellular process is measured over time, which leads to generation of time-series gene expression data. The static approaches assume that the genes are expressed in a steady state and thus cannot exploit and describe the dynamic process of gene regulatory process [4] because of changes in mRNA concentration over time during the cell cycle. Rather, the structure of genetic regulatory network is highly dynamic, in the sense that edges may present or absent in the response to different stress conditions over time [5]. Hence, the dynamic modelling GRN is a valuable tool to understand the complexity of gene regulation.

Types of Biological Data
The advent of advanced experimental technology like microarray technology, Chromatin immune-precipitation with array hybridization (ChIP-chip) leads to the production of huge amount of biological datasets which are quite heterogeneous in nature and difficult to analyse [6]. It can be envisaged that these data sets are very useful in getting insight into the complex underlying mechanism of gene-gene interactions, when they are coupled with suitable statistical techniques. This section reviews some of the main types of data used for the inference of gene regulatory networks.

Micro-array data
Micro-array data are mostly generated by DNA micro-array technology. Microarrays may be used to measure gene expression in many ways, but one of the most popular applications is to compare expression of a set of genes from a cell maintained in a stress or disease condition (condition A) to the same set of genes from a reference cell maintained under normal conditions (condition B).The number of data samples is in general much smaller than the number of genes. The microarray observations are commonly perceived as being extremely noisy [7] because of sources like systematic noise and random noise. The random noise is related to uncontrollable factors (e.g. array effect, biological sample effect etc.) in the microarray experiment and systematic noise (e.g. dye bias and faulty machine readings etc.) may be removed by normalisation.

Gene expression data
Gene expression data is the most widely used for gene regulatory network inference. The level of gene expression is an important indicator of how active a gene is, and is measured in the form of gene expression data. Similarity in the gene expression profiles of two genes advocates some level of correlation between them. So, it is a challenge to remove the error component from the micro-array observations, to get the true gene expression levels of the genes.

Dynamic modelling and inferring gene regulatory networks
Gene regulatory networks exhibit the regulatory relationships (activator or inhibitory) present among the genes. The true construction and interpretations of GRN depend on accurate and reliable estimation of gene-gene associations, which has paramount applications in Computational Biology in general and Medicinal Biology in particular. The following sectionis devoted to the main statistical methods used for dynamic modelling and inference of gene regulatory networks.

Linear state space models
The simplest state-space model representation of the genetic regulatory network is given by Wu et al. [8].
where Ais a matrix representing the regulatory interactions between the genes. y_t and x_tis the observed micro-array and gene expression value of the gene at time t respectively. The noise components ε_tand α_trepresent the transition system and the measurement noise, respectively, and are assumed to be Gaussian. For ease of calculation, matrix Bin equation 2 is usually taken as the identity matrix. Inference as well as estimation of parameters of GRN in case above state-space representation can be performed by expectation-maximization (EM) algorithm [8] and Kalman filter updates [7,9]. The simplicity of the state-space model avoids over-fitting of the network and therefore provides reliable results.

Non-linear state space models
It may be useful to represent genetic networks by simple state-space models, to ease out computational complexity, but the main drawback of the approach is that it almost silent about the non-linear interaction component, which is inherent in biological systems [10]. The state-space model for a time-series consists of a measurement equation relating the observed data to a state vector and a transition equation that describes the evolution of the state vector over time.
where g(.) is a function describing the evolution of the states and h (.) is the function mapping the unknown state vector to the observations. It is assumed that the noise presents in the system 〖{w〗_t} And {υ_t} are Gaussian with zero mean and variancecovariance structure is given by The nonlinear state-space representation model capturing the gene-gene interactions is given by Noor et al. [7]: x_(t+1,n)=∑_(m=1)^N▒〖b_nm f(x_(t,m) )+w_(t,n) 〗 (5) y_t=x_t+υ_t (6) Where N is the total number of genes in the network and f (.) is the regulatory function. The constants b_nmin the model (5) quantify the activation or repression of the expression of the n-th gene by its m-th regulators in the TRN A particular function that is frequently used to capture the nonlinear gene-gene interactions, is the sigmoid squash function given in equation 7 [7,11].
f(x_(t,m) )=1/(1+e^(-x_(t,m) ) ) (7) One way of solving these equations is by using the Extended Kalman Filter (EKF) [12] which is a popular algorithm for solving nonlinear state-space equations. EKF algorithm provides the solution by approximating the nonlinear system by its first-

Current Trends in Biomedical Engineering & Biosciences
order linear approximation. Other variants a Kalman filter algorithm like the Cubature Kalman Filter (CKF), Unscented Kalman Filter (UKF) [7].

Performance evaluation
The performance of these approaches can be accessed or evaluated using the knowledge of true network or gold standard network. Given two nodes in a network and the edge prediction problem can have four possible outcomes when compared with the true network (gold standard network): (a) if the edge occurs in both the true and the predicted network, the prediction is

Conclusion
Several statistical methods have been proposed to model and infer an integrated network of gene regulation. A state-space model for network inference involves parameter estimation which indicates the strength of the activatory and inhibitory regulations among genes. The main feature of state space models is that it captures dynamic structure of gene regulatory mechanisms. The computational efficiency of the linear state space models for the inference of GRN is much higher than that of Dynamic Bayesian technique for network inference [8]. Linear state-space model has the drawback, that it does not throw any light on the non-linear interactions among gene, which can be achieved by adopting nonlinear state space models. Moreover, microarray expression profiles are considered to be highly noisy due to different sources of variation exist in microarray experiments [7]. For this, the measurement equation of state space representation provides the precise estimate gene expression values, which can be further used [13][14][15].