Recently, many approaches have been proposed to address the problem of selecting both fixed and random effects in mixed effects models.
In this article, we review several approaches by comparing their procedures and performances, discussing their similarities and differences, and
explaining their advantages and disadvantages.
Keywords: Variable selection; Random effects; Mixed effects model; Bayesian model selection; Parameter expansion; Stochastic search
Abbreviations: LME: Linear Mixed Effects; MCMC: Markov Chain Monte Carlo; AIC: Akaike Information Criterion; BIC: Bayesian Information
Criterion; GIC: Generalized Information Criterion; SSVS: Stochastic Search Variable Selection
Linear mixed effects (LME) models  are widely used in
longitudinal studies to analyze correlated or clustered data.
Generally, the random effects are incorporated to account for
heterogeneity among the subjects. In analysis of an LME model, a
primary objective is to select potential significant fixed effects and
random effects of the outcome variables. Practically, one might
be able to use a selection approach (e.g., back-ward elimination
or forward selection) and apply a standard criterion, such as
the Akaike information criterion (AIC), generalized information
criterion (GIC), Bayesian information criterion (BIC), and Bayesian
factor, to choose a preferred model by fitting all the possible
models repeatedly. However, the number of competing models
increases exponentially with the number of predictors. Thus,
there are many challenging issues associated with the problem of
joint selection of both fixed and random effects such as: intense
computation, increased prediction error with increased number
of covariates , bias associated in estimated variance of the
fixed effects and near singular random effect covariance matrix
for underfitting and overfitting of the random effects, respectively
. In this paper, we review several exemplary approaches by
comparing their procedures and performances, investigating their
similarities and differences, and explaining their advantages and
Suppose we have n subjects in a study and each subject has ni
repeated observations. For i=1; … , n. Let yij denote the response
variable for subject i at observation j; Xij the corresponding l x 1
predictor; and Zij a predictor vector of dimension q x1. Then, we de
ne the LME model as follow:
subvector of ij X that include all the candidate predictors. We are
interested in selecting a subset of important predictors in the
Chen & Dunson  addressed the problem by using
the reparameterization approach of a modified Cholesky
decomposition of the random effects covariance matrix. The
covariance Ω can be decomposed as
excluded from the model. Γ is a lower triangular matrix,
with all diagonal elements being 1 and the other free elements
characterizing correlations between the random effects. With this random effects covariance matrix decomposition, model (1) takes the form:
Chen & Dunson  showed that by rearranging terms, the covariance matrix of the random effects can be expressed
λand r keep desirable conditional conjugacy properties and we can construct a Gibbs sampling algorithm for sampling the posterior distribution using the SSVS approach.
The priors are specified as follows. Let
be the vector of coefficients for the selected fixed effects in the current model, JXbe the corresponding covariates matrix. An i.i.d. Bernoulli prior is assumed for
Jβis assumed to be a Zellner g-prior ,
where IG(.) is an inverse Gamma distribution, 0(.)δdenotes a point mass at zero, N+(.) is a truncated positive normal distribution. The lower triangular free elements of Γ is put a normal distribution prior. For easy notation, we denote above zero-inflated truncated positive normal prior as ,~(0,1)kpkZINλ+. g is put a Gamma prior G(1/2; 1/2) and 2σa Jeffrey’s prior 21σ=or an inverse Gamma prior.
Chen & Dunson  specified the prior ~(0,).iNIξ Kinney & Dunson  used the approach of Gelman  and specified the covariance matrix ()(1,...,)iqVDiagddξ−. With this specification, the parameters ,ΔΓand ()iVξare not identifiable. In fact, Kinney & Dunson  took the parameter-expansion approach [8,9] that improved computational efficiency and reduces dependence among the parameters. We should note the Chen & Dunson  only considered variable selection of random effects. Kinney & Dunson  extended it to joint selection of both fixed and random effects for linear and logist models; in addition, the approach of Kinney & Dunson  also overcame the computational inefficiency due to slow mixing of the Gibbs sampler
Although it is simple and convenient to assume that the random effects are normally distributed. However, there are several limitations with such specifications: the assumption is often not reasonable; thus misspecification of random effects might result in misleading interpretation and even incorrect results. In addition, it is challenging to specify nonparametric distribution for the random effects since there is bias associated with fixed effects estimates when the expected values of random effects are not zero. To resolve the bias of random effects, Yang  and Yang (2013)  used the approaches of the Probit stick-breaking (PSB) and location-scale symmetrized PSB (sPSB)  for linear and logist models with joint variable selection for both fixed and random effects. They define
takes a Gaussian kernel. The heteroscedastic scale PSB mixture and heteroscedastic sPSB location-scale mixtures are defined as follows:
for the random effects in truncated form. Obviously, the nonidentifiability issue is resolved since both PSB and sPSB are centered at vector zero.
Yang (2012)  & Yang (2013)  provided nonparametric approaches for linear and logist models for joint variable selection of both fixed and random effects. Their approaches are much more flexible than those of Chen & Dunson  and Kinney & Dunson . However, the computation is more intense. Later Yang et. al.  used the shrinkage priors for mixed effects models with variable selection. The approach is efficient in shrinking small coefficients to zero while minimally shrinking large coefficients due to the heavy tails. They use several popular shrinkage priors: generalized double pareto prior , the horse shoe prior , and normal-exponential-gamma prior , respectively, as follows for the fixed effects:
where Ca+(0; c) denotes a standard half-Cauchy distribution on the positive reals with scale parameter c. The performances of the shrinkage approaches are very good while the computations are not that intense.
In this article, we reviewed several approaches of linear and logistic models for joint selections of both xed and random effects. The approaches of Chen & Dunson  and Kinney & Dunson  are simple in implementation and provide reasonably good results. The approaches of Yang (2012)  & Yang (2013)  are much more flexible and provide much better results though the computations are intense. The approach of Yang et.al.  maintains a good balance of benefits of the above-mentioned parametric and nonparametric approaches. In summary, the performance and computation intensity by descending orders are Yang (2012)  & Yang (2013) , Yang et al. , Chen & Dunson  and Kinney & Dunson .