Be wary of using Poisson regression to estimate risk and relative risk

Fitting a log binomial model to binary outcome data makes it possible to estimate risk and relative risk for follow-up data, and prevalence 
and prevalence ratios for cross-sectional data. However, the fitting algorithm may fail to converge when the maximum likelihood solution is on 
the boundary of the allowable parameter space. Some authorities recommend switching to Poisson regression with robust standard errors to 
approximate the coefficients of the log binomial model in those circumstances. This solves the problem of non-convergence, but results in errors 
in the coefficient estimates that may be substantial particularly when the maximum fitted value is large. The paradox is that the circumstances 
in which the modified Poisson approach is needed to overcome estimation problems are the same circumstances when the error in using it is 
greatest. We recommend that practitioners should be wary of using modified Poisson regression to approximate risk and relative risk.


Introduction
Direct estimation of risk and relative risk for prospective studies requires the fitting of a generalized linear model (GLM) with a binomial error distribution and logarithmic link function. This is the log binomial model (LBM). It provides estimates of probabilities and conditional probabilities that are directly interpretable and are preferred as measures of occurrence and association [1]. An added benefit is that the model provides interpretable estimates of prevalence and prevalence ratios for cross-sectional studies.
The drawback of the LBM is that the logarithmic link function maps the probability of the event onto the negative real line. This imposes bounds on the allowable parameter space for the model coefficients. Estimation subject to boundedness is problematic, but standard methods for fitting GLMs may fail to converge to the maximum likelihood (ML) estimates for a LBM if the fitted probabilities are allowed to equal or exceed unity. Even if the iterations converge and the approximate solution is reasonably accurate, there will be difficulties interpreting and applying the fitted values if one or more of them exceeds unity.
There are several work-around methods to approximate the solution of a LBM and circumvent the problems inherent in its estimation. Other than substituting estimates from a logistic regression model, the modified Poisson regression method has gained the most traction [2]. This method involves fitting a GLM with a Poisson error distribution and logarithmic link [3], and using the sandwich estimator to obtain variance estimates that are robust to the error misspecification [4]. Carter et al. [5] showed that the coefficient estimates from a Poisson regression model consistently estimate the coefficients from the LBM, and that the information sandwich estimator of the covariance matrix of the Poisson regression fit is a consistent estimator of the covariance matrix of estimated coefficients from a log binomial fit.
This approach requires no data modification and can be easily performed using widely available software. It seemingly resolves the convergence issues because Poisson regression maps the logarithm of the count of events to the entire real line. Thus, estimation can proceed even if the linear predictor is nonnegative. This means that the resulting coefficient estimates may yield fitted values for the LBM that are inadmissible as probabilities because they exceed unity. Some authors have suggested that these can safely be ignored [6]. However, the approximate solution may be subject to considerable error. Our eyes to this were opened by example data in a recent paper by Williamson et al. [2] exploring sources of failed convergence of Biostatistics and Biometrics Open Access Journal the LBM (Table 1). The authors attempt to fit the single covariate LBM Where, Y is a binary (0/1) outcome indicator with ) and report that only SAS is successful in finding the ML solution, though a warning is given in the SAS output that the convergence is questionable because the solution appears to be on the boundary.
In these circumstances, the analyst might follow wellintentioned advice to fit a modified Poisson model [5,7]. Datasets of size = 500 n were chosen as representative of many encountered in practice, and 10,000 replications were drawn for each setting. Table 2

Conclusion
We recommend that practitioners be wary of using the modified Poisson approach to estimate a LBM. Whilst errors greater than 20% may be a rarity, the estimates are subject to substantial bias. In the context of confounding, one authority has nominated 10% as the threshold for bias than cannot be ignored [9]. Based on that standard, the modified Poisson method failed on one-in-nine occasions when the ML solution was on the boundary, and on almost one-in-six occasions when additionally the Poisson fitted value exceeded unity. The relevance of a boundary solution is that it brings about the failure of standard fitting algorithms. The paradox is that this is the circumstance that prompts practitioners to switch to the modified Poisson approach. There are substantial error rates even when the solution is not on the boundary, but the modified Poisson approach is not required in those circumstances because standard software for fitting the LBM should be successful in iterating to the ML solution.