Application of Penalized Mixed Model in Identification of Genes in Yeast Cell-Cycle Gene Expression Datas

Linear mixed effects models have been used in a variety of study to analyze data with between-subject dependence [1]. For example, in analyzing longitudinal data, clustered data, repeated measurements and spatial statistics mixed effects models are often used. In this structure, the linear predictor contains a Gaussian zero-mean latent variable in addition to fixed effects. This latent variable is called random effects and this kind of models which contain fixed and random effects are called mixed effects models. These models are usually used for analyzing correlated outcomes in studies with small number of explanatory variables. But, the use of this model becomes a major problem in a high-dimensional dimensional setting or when the purpose of the study is variable selection. When the number of fixed and random variables increases, because of complexity of mixed effects model, the inference about the model become challenging. Therefore, the selection of fixed or random effects is a key problem in this status. There are many traditional approaches for variable selection. For example, AIC, conditional AIC, BIC, Bayesian variable selection and so on [2-5]. Most of these approaches are based on computing a chosen criterion and finding a subset of variables as the best subset based on the chosen criterion. Among these approaches conditional AIC [6] and Bayesian variable selection are commonly used for variable selection in mixed effects model. Another approach which proposed for variable selection in mixed effect models is the use of penalized likelihood approach. Although the use of penalized likelihood for the high-dimensional regression model (when n p  ), which is famous to regularized regression method, is traditionally proposed [7,8]. But, nowadays the use of penalized likelihood for variable selection in mixed effects models is a popular approach [9-12]. In this paper, we review the ordinary penalized likelihood approach for variable selection in mixed effects model. Then, we use the approach for variable selection with lasso penalty for variable selection in yeast cell-cycle gene expression data set. This paper is organized as follows: in the next section penalized likelihood function for variable selection in mixed effects model is reviewed. The variable selection is given for yeast cell-cycle gene expression data in Section 3. The last Section includes some conclusions.


Introduction
Linear mixed effects models have been used in a variety of study to analyze data with between-subject dependence [1]. For example, in analyzing longitudinal data, clustered data, repeated measurements and spatial statistics mixed effects models are often used. In this structure, the linear predictor contains a Gaussian zero-mean latent variable in addition to fixed effects. This latent variable is called random effects and this kind of models which contain fixed and random effects are called mixed effects models. These models are usually used for analyzing correlated outcomes in studies with small number of explanatory variables. But, the use of this model becomes a major problem in a high-dimensional dimensional setting or when the purpose of the study is variable selection. When the number of fixed and random variables increases, because of complexity of mixed effects model, the inference about the model become challenging. Therefore, the selection of fixed or random effects is a key problem in this status. There are many traditional approaches for variable selection. For example, AIC, conditional AIC, BIC, Bayesian variable selection and so on [2][3][4][5]. Most of these approaches are based on computing a chosen criterion and finding a subset of variables as the best subset based on the chosen criterion. Among these approaches conditional AIC [6] and Bayesian variable selection are commonly used for variable selection in mixed effects model. Another approach which proposed for variable selection in mixed effect models is the use of penalized likelihood approach. Although the use of penalized likelihood for the high-dimensional regression model (when n p  ), which is famous to regularized regression method, is traditionally proposed [7,8]. But, nowadays the use of penalized likelihood for variable selection in mixed effects models is a popular approach [9][10][11][12]. In this paper, we review the ordinary penalized likelihood approach for variable selection in mixed effects model. Then, we use the approach for variable selection with lasso penalty for variable selection in yeast cell-cycle gene expression data set. This paper is organized as follows: in the next section penalized likelihood function for variable selection in mixed effects model is reviewed. The variable selection is given for yeast cell-cycle gene expression data in Section 3. The last Section includes some conclusions.

Penalized likelihood function for mixed effects model
In mixed effects model penalized likelihood function is usually used for both fixed and random effects. In the following, after introducing the notation used in this paper, at first penalized likelihood for fixed effects and then that for random effects are discussed.

Notation
Let there be N individuals in the study. Let In linear mixed effects model, the model can be written as follows: are independent. In a matrix notation, let , , y b ε and X be matrices which obtained by stacking vectors of , , be a block-diagonal matrix. Then, the linear mixed effects model can be rewritten as

Selection of important fixed effects
The likelihood of marginal model (1) can be expressed as Where, dependence on sample size n is considered by adding the index n to (.,.), L also, To select the important covariates, the use of the following penalized loglikelihood function is used: As mentioned before, Σ depends on the unknown covariance matrix ∆ and 2 σ . Based on Theorem 1 of Fan & Li [12], the important fixed effects have oracle properties.
The oracle property is that the asymptotic distribution of the estimator is the same as the asymptotic distribution of the MLE on only the true support. That is, the estimator adapts to knowing the true support without paying a price (in terms of the asymptotic distribution). In short, an oracle estimator must be consistent in parameter estimation and variable selection.
Notice that an estimator that is consistent in variable selection is not necessarily consistent in parameter estimation [13].

Identifying important random effects
As mentioned by Fan & Li [12], the number of random effects q may be increased with sample size n so its dependency on n can be written by . n q The estimation and therefore identifying of random effects are different from fixed effects.
One of the most famous approaches in estimating random effects is the empirical Bayes approach [14]. But, this approach is not useful for selecting random effects. In the following, we review the proposed method of Fan & Li [12] for identifying important random effects. Consider given the density function of w is given by This conditional probability is independent of the fixed effects β and A As mentioned before Moore-Penrose generalized inverse of ∆ Then, a group variable selection strategy is needed to identify true random effects. For this purpose, consider the following regularization problem: Also, based on Theorem 2 of Fan & Li [12] the identified random effects is close to oracle estimator.

Tuning parameters selection
Different penalty function to achieve the purpose of variable selection in mixed effects models is proposed [10][11][12][13][14][15][16][17]. Some of the penalty function for variable selection in mixed effects models are lasso penalty function: In this paper, we use lasso penalty function and available lmmlasso in R for variable selection in time-course gene expression data [18]. One of the important stages in the used of the penalized likelihood function is the selection of the tuning parameter. The above-mentioned penalty function has some tuning parameter ( 1 λ and 2 λ ). A popular approach for selecting tuning parameter is based on some criteria such as AIC and BIC. In this framework, the selected tuning parameter is that with minimal AIC or BIC.

Yeast cell-cycle gene expression data
In this section, we use yeast cell-cycle gene expression data which collected in yeast cell cycle analysis project by Spellman et al. [19]. The goal of the project is to identify all genes whose mRNA levels are regulated by the cell cycle. The experiment recorded genome-wide mRNA levels for 6178 yeast ORFs from an α -factor based experiment. Each response variable corresponds to mRNA levels measured at every 7 minutes during 119 minutes (a total of 18 measurements). Also, we consider the ChIP-chip of the above-mentioned 106 TFs as explanatory variables. More information about this data set can be found in [20]. Figure 1 presents spaghetti plot for whole genes (panel a) and 10 randomly selected genes (panel b). We consider a penalized mixed model with 106 TFs as fixed effects with a random intercept and a random slop for time. Table 1 shows the 26 important covariates which selected using penalized likelihood mixed model by lasso penalty function. The tuning parameter is selected as 1 3. λ = Also, none of the random effects are selected.

Conclusion
In this paper, we review variable selection in mixed effects model using penalized likelihood approach. In this framework, we discussed how one can select fixed effects, random effects and tuning parameter. Also, in this paper, we consider lasso penalty function, also, we analyze a high-dimensional time course yeast gene expression data, where from 106 TFs, 26 of them were selected by the model to be important.