Longitudinal Data Analysis Using Liu Regression

Where, ( ) 1 2 , , , n Y y y y =  is an 1 n× vector of responses, X is an n p × design matrix comprised of p n < columns representing each of the potential predictor variables, n is the number of individuals in our sample and ( ) 2 0, n n N I ε σ  is an 1 n× vector of independent errors. The least squares (LS)/maximum likelihood (ML) estimator of the regression coefficients is given by ( ) 1 ˆ T T X X X Y β − =


Introduction
We begin with the simple linear regression model given by Notably, in the case that the columns of X are highly correlated, T X X will be singular and we replace ( ) where '-' denotes the generalized inverse, and a unique solution to equation (1) does not exist. Further, in the case of high correlation where T X X is still invertible, the resulting coefficient estimates will have largely inflated variances, which in turn, results in low predictive precision.
Ridge regression, designed specifically to handle correlated predictors, involves introducing a shrinkage penalty λ to the least squares equation, and subsequently solving for the value such that The solution to equation (2) is given by and we have [2], Further, dividing β by root n times the square root of its variance has a Student's t-distribution with effective degrees of freedom (EDF) given by ( ) [3][4][5]. However, the ridge regression estimator ˆR β is non-linear with respect to λ and its estimation is challenging. An alternative approach is proposed by Mayer & Willke [6]. The key idea is d β is closer to the true β for 0 1. d < < In section 2, we will develop their idea for longitudinal mixed model.

Linear mixed effects model
Now, consider the setting in which multiple measurements are observed for each individual over time. The mixed effects model for this setting is given by

Biostatistics and Biometrics Open Access Journal
The log-likelihood function of Y based on this model is given by i Maximizing this function with respect to the fixed effects parameter vector, β in the non-penalized setting is equivalent to minimizing the least squares objective function that gives the estimate of β as ( )

Mixed-liu regression
In this section, we introduce a penalized regression approach to estimation for the mixed model given in equation (3). To begin, we assume the variance parameters ( ) , D θ σ = are known and add a penalization term to objective function of mixed model, which yields Differentiating the objective function in equation (6), setting the resultant equal to 0 and solving, we have: Additionally, it can be shown that We suggest to estimate d in equation (7) by More generally, consider the setting in which the variance parameters ( ) , D θ σ = are unknown. Eliot et al. [1] proposed an extension of the expectation-maximization (EM) algorithm described by Laird & Ware [4], that includes an additional step for estimation of the ridge component. Here, we exhibit an EM algorithm to solve ˆM IV. Repeat Steps (1)-(3) a large number of times and until a convergence criterion is met.
In the forthcoming section we evaluate the performance of the mixed Liu estimator by a Monte Carlo simulation study.

Simulation study
A simple simulation study is conducted to characterize the relative performances of mixed Liu regression and the usual mixed effects modeling approach in the context of multiple, correlated predictors. For simplicity of presentation, the simulation study assumes repeatedly measured outcomes, while the predictor variables are measured at a single, baseline time point, as in Eliot et al. [1]. We further assume Each predictor variable is assumed to arise from a normal distribution with mean equal to 5 and variance equal to 1. The correlation between predictor variables, given by ρ in Table 1, is assumed to take on values between 0 and 0.9. Starting values for the variance components are derived based on fitting a mixed model with no Liu component. In total,  Table 1, the mixed Liu estimates often have lesser bias than the mixed ones. Also the mixed Liu is superior, in standard deviation (sd) sense.

Real data analysis
The data set we are analyzing here is the Mayo Clinic Primary Biliary Cirrhosis data, from the package "JMbayes" in R software. It consists of 312 randomized patients with primary biliary cirrhosis, a rare autoimmune liver disease, at Mayo Clinic. In this study we have 1945 observations on the 20 variables, listed in Table 2. The response variable is the number of years, indicated as "years" in Table 2 and the variables considered as fixed and random effects are marked as "F" and "R", respectively in this table. Since the variables have been measured number of times for each individuals, so we have a longitudinal data set. On the other hand, some of the variables like sex in this data set will put the subjects in special groups, so we can consider these variables as random effects, as marked as "R" in Table 2, so we should use mixed model for analyzing this data set. The estimate of coefficients are obtained using the EM algorithm as outlined in Section 2. To compare the performance of the mixed Liu estimator, we evaluate the mean prediction error (MPE); the lesser, the better. In what follows, we describe the scheme we used to derive the MPE. The process is repeated for all K subsets and the prediction errors are combined. To account for the random variation of the cross validation, the process is reiterated N times and is estimated the average prediction error is given by PE is the prediction error of considering kth test set in ith iteration. Our result are based on 200 N = case resampled bootstrap sample. In Table 3, we report the estimates and MPE values. Based on the results, the proposed mixed-Liu estimator performs better than the mixed one, in MPE sense. Further, the absolute value of estimates in the mixed Liu estimates are lesser than the mixed.

Conclusion
In this paper, we developed a linear unified procedure called Liu in the linear mixed model for longitudinal data analysis. Hence, we considered a penalized likelihood approach and propose the Liu-mixed regression estimator for the vector of regression coefficients. An EM algorithm also exhibited to solve the penalized likelihood for the unknown parameters. Numerical studies demonstrated the good performance of the proposed mixed Liu estimator for the multicollinear situation.