Hongyan Xu; Fengjiao Hu; Santu Ghosh; Sunil Mathur; Varghese George

doi:10.19080/BBOAJ.2019.09.555763

Research Article

Detecting Differentially Methylated Genes Associated with Drug Response

**Hongyan Xu*, Fengjiao Hu, Santu Ghosh, Sunil Mathur and Varghese George**

Department of Population Health Sciences, Augusta University, USA

Submission: July 25, 2018; Published: March 22, 2019

*Corresponding author: Hongyan Xu, Department of Population Health Sciences, Medical College of Georgia, Augusta University, USA

How to cite this article: Hongyan X, Fengjiao H, Santu G, Sunil M, Varghese G. Detecting Differentially Methylated Genes Associated with Drug Response. Biostat Biometrics Open Acc J. 2019; 9(3): 555763. DOI: 10.19080/BBOAJ.2019.09.555763

Abstract

DNA methylation has long been involved in inter-individual variations in drug response. In this study, we focused on the methylation changes associated with the response in terms of triglyceride changes before and after the treatment with fenofibrate using the real data set. We analyzed samples that are independent (founders and marry-ins) from each pedigree. Subjects were categorized into responders and non-responders according to percent changes in triglyceride. We then applied a novel spatial scan statistic to identify genes that are differentially methylated between the responders and non-responders. All the CpG sites within a gene were analyzed together. The spatial scan statistic approach uses a mixed-effects model to incorporate correlations of methylation rates among CpG sites. We analyzed the methylation data at visit 2, accounting for the effects of age, sex, and smoking status as covariates. Methylation levels at 312 genes from 22 autosomes were significantly associated with drug response with p< 0.01.

Keywords: DMRs: Differentially Methylated Regions; MLE: Maximum Likelihood Estimator

Introduction

Drug response is a complex trait involving multiple genetic and epigenetic factors. In particular, DNA methylation, which is an important regulator of gene expression, has been shown to be involved in inter-individual variation in drug response [1]. In the past, most such studies took a candidate gene approach in which only a few genes are studied. With the advent of high-throughput genomic technologies, we can now survey DNA methylation information across genome-wide CpG sites. The methylation information could be analyzed with a single-marker approach. However, this may lead to many false-positives because of the huge number of CpG sites genome-wide. It has been found that methylation levels at close-by CpG sites could be highly correlated. The single marker approach also ignores this correlation. Therefore, a better approach is to jointly analyze the CpG sites in a genomic region and identifying differentially methylated regions (DMRs) between different drug response groups. In this study, we take a region-based approach and treat each gene as a genomic region to identify DMRs between responders and non-responders in terms of triglyceride changes before and after the fenofibrate treatment. We applied a novel scan statistic approach based on normal distribution to the real data provided by GAW20

Method

In this section, we describe the scan statistic to detect DMRs based on the difference in methylation rates between two groups (responders and non-responders) [2].

Adjusting for correlation between CpG sites

We assume kijp is the true methylation rate at CpG site j for individual i in group Here kA= for responders and ku= for non-responders.

To account for the correlation of methylation rates among nearby CpG sites, a random slope and intercept logistic regression model is considered to model methylation rate at each CpG site for every individual. A random slope and intercept logistic regression has the following form,

Where js represents the distance of CpG site j from the start point. In the mixed-effect model setting, the random effect is assumed to vary independently across individ uals, with By adding kix in the mixed-effect model (1), we can also adjust for the covariates.

The fitted odds of methylation rates can be calculated for CpG site j of individual i in group ,k and can be used to get the corresponding adjusted expected methylation rate ˆ,kijp with its logit transformation ()ˆˆlog Then we can calculate the adjusted logit transformation of methylation rates (residuals) as, Since the methylation rates are independent, the rate at each CpG site j in group k is given by

Scan statistic based on normal distribution

with known 2 Aj σ and 2 , Uj σ where A μ and U μ are the true methylation rates in cases and controls, respectively. Considering then the likelihood of kj y is given by

It is evident from this likelihood that the distribution of adjusted methylated rate follows a one-parameter exponential familyS.ID.5 and the log-likelihood after ignoring an additive constant that does not depend on η.

Based on this likelihood function, we can find the maximum likelihood estimator (MLE) of parameter η in the one-parameter exponential family where

For a specific genomic region, after adjusting for correlation between CpG sites by using the mixed-effect model, are assumed to be independent for the s consecutive CpG sites. Then,

In order to test the hypotheses 0 : A U H μ = μ versus 1: , A U H μ ≠ μ the ratio of the likelihood under 1 H versus 0 H can be used as a test statistic. More conveniently, we can use the log of this likelihood ratio as our test statistic, which we refer to as the scan statistic. It is given by

for the two groups

Application to the GAW20 real data

In our analysis, we first classified a subject as either a responder or a non-responder. A subject was classified as a responder if the percent change between pre and post treatments values is more than 35% [4,5]. Using this criteria, there are 42 responders and 45 non-responders in our sample. We then applied the scan statistic method to the methylation data at visit 2 for chromosomes 1 to 22. All the CpG sites within a gene region are jointly analyzed. Based on the annotation file provided, for each chromosome a gene region is defined as the continuous region within same gene name

Result

We further performed functional annotation of the significant genes using DAVID 6.8 (https://david.ncifcrf.gov/). Eleven genes, HMGCR [6,7], KLF10 [8], RBMS1 [9], THADA [10], CRY2 [11], FADS1 [12], PTER [13], STK11 [14], TSPAN8 [15], TFRC [16], and IARS2 [17], were found to be involved in type 2 diabetes, in which lipid levels including triglyceride have been shown to be significant risk factors.

We have performed further analysis including single CpG analysis and another region-based analysis with IMA [18], for comparison purpose. The single CpG analysis was carried out using limma package in Bioconductor for all the 4,565 CpG sites annotated to the 312 genes identified through our approach. We also adjusted for the effect of age, sex, and smoking status as we did in our approach. Single CpG analysis identified 836 CpG sites from 112 genes with p < 0.01. We performed region-based analysis using IMA package in Bioconductor for the 312 genes identified through our approach. We also adjusted for the effect of age, sex, and smoking status. IMA detected 11 genes with p < 0.01

Discussion

In this study, we applied a novel method based on scan statistic to detect differentially methylated genes between responders and non-responders to the treatment with fenofibrate. We treat each gene as a genomic region and our method accounts for the correlation of methylation levels between CpG sites within a gene. By doing so, we can utilize the information across multiple CpG sites to boost the statistical power of identifying difference in methylation levels between the two groups. The method is based on regression approach. Therefore, it is natural to account for the effect of covariates by including them in the model. We applied our method to the GAW20 real data set and was able to identify genes with biological relevance. One of the limitations is that the statistical significance is based on 1,000 permutations because of the constraints on computational speed. Therefore, the smallest p-value we can get is 0.001 due to the discreteness of the test statistic distribution from permutation.

From our comparison with the single CpG site analysis approach and the region-based analysis using IMA, these two approaches detected less significant genes at 0.05 level. It should be noted that this is not a strict comparison in statistical sense because we only performed the analysis in the subset of 312 genes detected with our approach. In summary, we proposed a new region- based method to detect genes whose methylation levels are associated with triglyceride responses to drug treatment. We applied our method to the real data set from GAW20. Our method could identify some genes related to obesity and lipid in the literature, which suggests that this is a reasonable approach.

BBOAJ.MS.ID.555763

Our Media Partner

BBOAJ Menu

Useful Links

Downloads