Comparative Appraisal of Estimated Odds Ratio and Risk Ratio Using Binary Regression Models: Analysis of Nodal Involvement Among Oral-Cancer Patients
Vishwajeet Singh1, Alok Kumar Dwivedi2, Sada Nand Dwivedi1* and SVS Deo3
1 Department of Biostatistics, All India Institute of Medical Sciences, India
2 Division of Biostatistics & Epidemiology, Texas Tech University Health Sciences Center, USA
3Department of Surgical Oncology, All India Institute of Medical Sciences, India
Submission: October 09, 2018; Published: December 06, 2018
*Corresponding author: Sada Nand Dwivedi, Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India
How to cite this article: Vishwajeet S, Alok K D, Sada N D,SVS Deo. Comparative Appraisal of Estimated Odds Ratio and Risk Ratio Using Binary Regression Models: Analysis of Nodal Involvement Among Oral-Cancer Patients. Biostat Biometrics Open Acc J. 2018; 8(4): 555745. DOI: 10.19080/BBOAJ.2018.08.555745
Abstract
Predictive modeling for binary outcome in the form of logistic regression is very common in medical area. However, there is a substantial debate on the estimate of most appropriate association measures while analyzing binary outcomes. Odds ratio (OR) needs to be used in the case of case-control study only whereas prevalence ratio (PR) which is equivalent to relative risk (RR) should be appropriately used in the case of cross-sectional studies. Recently, Dwivedi et al. [1] proposed a modification in Diaz-Quijano method (BRR) to estimate RR along with appropriate 95% CI directly from logistic regression for a prevalent binary outcome. However, this method requires a modification in the dataset prior to applying regression analysis. We propose a slight modification in the log binomial regression to obtain RR without having any convergence issues. Accordingly, our objective of this study was to evaluate association measures obtained using conventional logistic regression (CLR) model, binary relative risk regression (BRR) using Dwivedi et al method and proposed modified log binomial regression (MBR) on a real published dataset on oral cancer patients. Our result suggests that both methods (MBR and BRR) produced similar estimates along with their 95%CIs. This suggests that any method either BRR or MBR can be used to appropriately estimate RR and 95% CI.
Keywords: Logistic regression; Modified Diaz-Quijano method; Modified Log Binomial Regression; Oral Squamous Cell Carcinoma; Nodal Involvement; Metastasis
Abbrevations: OR: Odds ratio; PR: prevalence ratio; RR: relative risk; CLR: conventional logistic regression; BRR: binary relative risk regression; MBR: modified log binomial regression; MDQ: modification in Diaz-Quijano method; CI: confidence interval; MLE: maximum likelihood estimate; OPD: outpatient department; IRCH: Institute Rotary Cancer Hospital; AIIMS: All India Institute of Medical Sciences; SMF: sub mucous fibrosis; AUC: Area Under Curve
Introduction
There is a substantial debate on the estimate of most appropriate association measures while analyzing binary outcomes. Appropriately odds ratio (OR) needs to be used in the case of case-control study design whereas risk ratio or relative risk (RR) measure should be preferred in the case of cohort/clinical trial studies. In addition, prevalence ratio (PR) should be appropriately used in the case of cross-sectional study. In medical research, researchers are often interested in determining risk factors associated with a disease or a binary outcome using descriptive modeling approach with cross-sectional studies. In such studies, a binary regression model typically with logit link is often applied for analyzing data. The logistic regression analysis conventionally provides the association of covariates with a binary outcome in the form of OR.
This OR represents the constant effect of a covariate on the odds of occurrence of outcome. Naturally, in prevalence or descriptive studies, PR, an equivalent measure to RR, seems to be a more appropriate measure of association as opposed to OR. In addition, for highly prevalent outcome (>10%), OR often overestimates the effect size when it is more than 1 and underestimates when it is less than 1 [2]. Hence, it is advisable to use RR for highly prevalent outcomes to summarize effect size more accurately [2,3].
The most direct binary regression method for estimating RR is the log binomial regression analysis. Log binomial regression may fail to estimate RR due to convergence problems. In consequences, several alternative methods for estimating RR have been developed [1-4]. The most popular alternative method is the modified Poisson regression model. However, this method may also yield convergence issue sometimes and out of range fitted probabilities. Recently, Dwivedi et al. [1] proposed a modification in Diaz-Quijano method (MDQ) to estimate appropriately 95% confidence interval (CI) directly from logistic regression and reported that it provides a more accurate estimate along with appropriate 95% CI for a prevalent binary outcome. However, this method required a modification in the dataset prior to applying regression analysis. We propose a slight modification in the log binomial regression to obtain RR without having any convergence issues. Accordingly, our objective of this study was to evaluate association measures obtained using conventional logistic regression (CLR) model, binary relative risk regression (BRR) using Dwivedi’s method and proposed modified log binomial regression (MBR) on a real published dataset to examine factors associated with the presence of cancer in nodes among oral cancer patients [5].
Material and Methods
Methods
Conventional logistic regression (CLR): CLR is the most commonly used method to model binary outcome. CLR fits the log of odds as a linear function of covariates. If m is the number of cases with specific outcome, n is the number of non-cases and Xi are the k covariates then logistic regression model can be expressed as [6]:
Where 0a is the intercept and a1 to ak are the respective regression coefficients of the covariates. However, as stated earlier, this method does not yield approximate RR when outcome is common.
Binary Relative Risk Regression (BRR): The detailed methodology related to BRR is reported by Dwivedi et.al. [1]. The concept of this method was to obtain a binomial regression using CLR, where binomial regression fits the log of probability of outcome as a linear function of covariates. Suppose m is the number of cases and n is the number of non-cases and Xi are the k covariates then log binomial model can be written as:
Where 0b is the intercept and b1 to bk are the regression coefficients of respective covariates.
For this, Diaz-Quijano proposed [7] to duplicate the events (m) as non-events and considered (m+n) as patients at risk (n’) in the dataset, then:
Afterwards, the CLR model can be performed on the modified dataset. The CLR provides OR, however intuitively OR on modified dataset is mathematically equivalent to RR [7]. To overcome the clustering effect and over representation of total cases, the CLR on modified dataset after accounting for clustering effect needs to be used to estimate appropriate 95% CI [1]. In STATA, CLR with a cluster variance option can be used for analysis.
For example:
logistic outcome covariates , cluster(ID) /// ID is the unique subject identification number
Modified Log Binomial Regression (MBR): The log binomial regression method is the direct method to estimate RR as expressed in equation (2). The standard log binomial regression makes a restriction on the parameter space to get probability between 0 and 1. In consequences, maximum likelihood estimate (MLE) method sometimes does not provide a convergent solution which can be resolved by expanding original dataset into c copies using COPY method [8]. The problem in the COPY method is to obtain an optimum number of copies of original dataset which can provide parameter estimates without any convergence issue. In binomial regression, each record is considered to follow a Bernoulli distribution. We propose to consider each record to follow a binomial distribution from a constant “c” number of trials. The revised binomial regression equation can be expressed as:
The choice of c could be any large constant number, however we suggest considering c=total sample size (m+n=N), where N is the total sample size of the study. This modification may distort the variance estimates of regression coefficients. To get appropriate estimate of standard error, we suggest using the robust variance estimate using sandwich estimator. In STATA, MBR can be used two ways for analysis.
glm outcome covariates , family (binomial N) vce(robust) eform
OR
binger outcome covariates , n(N) vce(robust) rr
Statistical Analysis
Data were summarized using appropriate summary measures. Unadjusted association between each cofactor and presence of node positivity was evaluated using CLR and BRR analyses. After that a stepwise (with forward 15% and backward 10% probability) regression procedure was used to select significant covariates for multivariable analysis. A common set of variables, which were found to be significant at 25% level of significance under univariable analysis and/or clinically important, were retained in multivariable models. Three multivariable models (CLR, BRR, MBR) were constructed for descriptive comparative evaluations. The results are presented in the form of OR (95% CI) obtained using CLR and RR (95% CI) obtained using BRR or MBR. The model performance was summarized using area under the receiver operating characteristics curve (AUC-ROC) with 95% CI. Statistical software, STATA/SE version 14.2 (StataCorp LP, College Station, TX, USA), was used for analysis.
Data
We utilized oral cancer patients dataset for comparative evaluations and illustration of, CLR, BRR, and MBR models [5]. The ongoing data base contains any pathologically proven oral cancer patients attending to the outpatient department (OPD) of Department of Surgical Oncology at the Institute Rotary Cancer Hospital (IRCH), All India Institute of Medical Sciences (AIIMS), New Delhi, India. .We queried this database for the period 1995 to 2013 and collected data on unique patients with intention to determine factors associated with the presence of cancer in nodes. Under this study, a total of 945 histopathologically proven oral squamous cell carcinoma patients who underwent for surgery including neck dissection were included. Unknown or other than oral cancer, histopathologically not proven, not SCC cases were excluded from the study.
The histopathological status of nodes in the form of involved (presence of cancer in nodes) and not involved (no presence of cancer in nodes) among oral cancer patients was considered as a binary outcome in the study. All the important demographic, clinical and pathological characteristics of the patients, relevant for study population were included as well. Data were coded as per clinically meaningful criteria.
Results
Out of 945 patients, nodal involvement was highly prevalent (39.8%; 95%CI: 36.7%, 43%). The frequency distribution and unadjusted OR/RR along with corresponding 95% CI are presented in Table 1. In univariable logistic regression analysis, 13 variables were found to be significant at the level of 25% under all three methods. At the level of 5%, six variables were found significant under CLR model, whereas only four under BRR and MBR model were found significant. Comparatively, the effect size under CLR was higher for variables which were found significant in both OR and RR models, but lower for variables which were found significant only in CLR model. However, the 95% confidence intervals were narrower in the case of BRR/MBR model.
LC, Lower class; LMC, Lower middle class; UC, Upper class; UMC, Upper middle class; UIG, Ulceroinfiltrative; UPG, Ulceroproliferative; WD, Well Differentiated; Deg of Dif, Degree of differentiation; cSkin; Inv, Clinical skin involvement; cBone, Clinical bone involvement; SMF, Sub mucous fibrosis.
# BRR model, # # MBR model
AOR, Adjusted odds ratio. ARR, Adjusted risk ratio
*Adjusted in relation to smoking, duration of risk, tumor growth type and clinical bone involvement.
# BRR model, # # MBR model
Under multivariable regression analyses, the adjusted OR (95%CI) and RR (95%CI) are presented in Table 2. Similar to univariable analysis, the effect size under CLR model was higher as compared to that under BRR or MBR models. Further, BRR and MBR models revealed similar results. Also, they resulted more precise confidence interval than conventional CLR. Under CLR model, patients with pain at the time of presentation [OR (95% CI): 1.34(1.02 to 1.77)], presence of clinical node [OR (95% CI): 2.38(1.69 to 3.35)], tongue as compared to buccal mucosa [OR (95% CI): 1.63(1.07 to 2.46)] and not well differentiated [OR (95% CI): 1.41(1.05 to 1.89)], were more likely to have nodal involvement. Likewise, under BRR as well as MBR models also, patients with pain at the time of presentation, presence of clinical node, tongue as compared to buccal mucosa and not well differentiated, remained significantly associated with nodal involvement. However, one of the covariates, sub mucous fibrosis (SMF) became non-significant in BRR/MBR models whose effect size was lower through CLR model. Further, Lip as compared to buccal- mucosa became significant under BRR/MBR models.
Prediction probability (AUC) of patients with nodal involvement is presented in Figure 1. The prediction probability (AUC) of patients with nodal involvement was similar under all the models. The AUC under CLR model was 0.65 (95% CI: 0.61-0.68), under BRR model it was 0.65 (95% CI: 0.61-0.68) and under MBR model it was 0.65 (95% CI: 0.61-0.69) (Figure 1).
Summary and Conclusion
From theoretical and practical point of view, reporting RR should be preferred over OR in all studies except for case-control studies. Although the application of relative risk regression for the analysis of binary outcome has increased in recent years, the use of CLR is still common due to its familiarity among applied researchers and ease of its application. For descriptive or inferential studies, OR models may produce biased association and inappropriate conclusion especially for common outcome. Furthermore, the interpretation of OR as RR is still in practice. Log binomial regression is the preferred method for estimating RR followed by modified Poisson method. However, these methods may produce out of range probability [9]. Recently, Dwivedi et al. [1] proposed a simple relative risk model based on logistic regression and demonstrated its superiority in estimating RR measure with appropriate 95% CI by comparing with common alternative methods. Williamson et al. [9] suggested to use log binomial regression over modified Poisson regression for estimating RR wherever it is feasible and provided some solutions to avoid convergence problems. However, these solutions are often infeasible. In this study, we proposed a modification in log binomial regression to overcome convergence problems related to log binomial regression models compared with CLR and BRR methods.
In our study, standard log binomial regression did not converge for any multivariable analysis. The proposed MBR converged for all analyses and have properties to resolve converge issues related to standard log binomial regression analysis. Both methods (MBR and BRR) produced similar estimates along with their 95%CIs. This suggests that any methods either BRR or MBR can be used to appropriately estimate RR and 95% CI. However, modified Poisson regression produced unbiased estimates of RRs compared to standard log binomial regression model under model misspecification in a recent study [10].
This suggests that our MBR may not be appropriate for estimating RR under misspecification of link function. Under such instance, BRR method based on logistic regression proposed by Dwivedi et al. [1] should be preferred over modified Poisson regression due its convergence, reliability and feasibility. In our study, CLR did not only yield different effect sizes compared to BRR/MBR methods but it also provided different conclusions for a few factors. This indicates that appropriate RR methods should be used for studies desiring RR estimates. Under the present study, the analytical results on nodal involvement among oral cancer patients supports the statistical findings reported under previous study [4]. To be more specific, as obvious, computation of RR appropriately (BRR/MBR model) remained more precise in comparison to computation in terms of OR inappropriately (CLR model).
The estimates in relation to risk factors involved over estimation under CLR model in comparison to BRR/MBR model. In contrary, the estimates in relation to protective factors involved under estimation under CLR model in comparison to those under BRR/MBR model. In other words, in case of highly prevalent outcome (>10%) like in this study, computation of OR may result into overestimate of the effect size when it is more than 1, and underestimate of the effect size when it is less than 1. Similar findings are also reported under previous studies in another area [1]. As a result of this, in comparison to CLR results, sub mucous fibrosis emerged to be non significant under BRR/MBR model whereas lip as compared to buccal mucosa emerged to be significant. However, regardless of used regression approach, the common set of predictors were pain at the time of presentation, presence of palpable neck node, tongue as compared to buccal mucosa and degree of differentiation. These results may obviously be helpful to the clinicians in management of oral cancer patients.
To the best of our knowledge, for the first time we proposed a simple and feasible approach to resolve convergence problems in estimating RRs from a log binomial model. This study also provided a descriptive evaluation of CLR with MBR and BRR methods using a fairly large database on oral cancer patients with a comprehensive set of risk factors. One of the limitations of our study is that we did not provide simulation studies to provide comparative evaluation across three models. However, the simulation approach for comparative evaluation was not deemed necessary as we used robust methods of estimating RR.
In conclusions, relative risk regression should be preferred for studies requiring reporting of RR measure. Proposed modification in log binomial regression seems to be a suitable method as it avoids convergence issues. A direct method of estimating RR such as MBR or BRR method should be preferred to estimate RR and 95%CI. Comprehensive evaluation of different RR approaches are required under model misspecification. Future studies are required for extending RR models for multinomial outcomes.
Acknowledgement
Thanks to All India Institute of Medical Sciences for providing the required facilities.
References
- Dwivedi AK, Mallawaarachchi I, Lee S, Tarwater P (2014) Methods for estimating relative risk in studies of common binary outcomes. Journal of Applied Statistics 41(3): 484-500.
- Zhang J, Yu KF (1998) What’s the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA 280(19): 1690-1691.
- Zocchetti C, Consonni D, Bertazzi PA (1997) Relationship between prevalence rate ratios and odds ratios in cross-sectional studies. Int J Epidemiol 26(1): 220-223.
- Zou G (2004) Modified Poisson regression approach to prospective studies with binary data. Am J Epidemiol 159(7): 702-706.
- Singh V, SVS Deo, Dwivedi SN, Khan MA (2018) An Epidemiological Model to Find out Factors Associated with Nodal Involvement among Indian Oral Cancer Patients. Open Journal of Epidemiology 8: 117-129.
- Sundaram KR, Dwivedi SN, Sreenivas V (2015) Medical Statistics Principles & Methods, (2nd edn). Wolter Kluwer, Netherlands.
- Diaz-Quijano FA (2012) A simple method for estimating relative risk using logistic regression, BMC Medical Res. Methodol 15: 12-14.
- Deddens JA, Petersen MR (2004) Re: ‘Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol 159: 213-214.
- Williamson T, Eliasziw M, Fick GH (2013) Log-binomial models: exploring failed convergence. Emerg Themes Epidemiol 10(1): 14.
- Chen W, Qian L, Shi J, Franklin M, (2018) Comparing performance between log binomial and robust Poisson regression models for estimating risk ratios under model misspecification. BMC Medical Research Methodology 18: 63.