## Development of Predictive Signatures for Treatment Selection in Precision Medicine

### Un Jung Lee^{1}, ShengLi Tzeng^{2}, Yu-Chuan Chen^{1} and James J Chen^{1,3}*

^{1}Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, USA

^{2}Department of Public Health, China Medical University, Taiwan

^{3}Department of Biostatistics, University of Arkansas for Medical Science, Arkansas

**Submission:** March 06, 2017; **Published:** August 17, 2017

***Corresponding author:** James J Chen, National Center for Toxicological Research, US Food and Drug Administration, USA; Email: JJChen@uams.edu

**How to cite this article:** Un Jung Lee, ShengLi Tzeng, Yu-Chuan Chen, James J. Chen Development of Predictive Signatures for Treatment Selection in Precision Medicine. Biostat Biometrics Open Acc J. 2017: 2(4):555594. DOI: 10.19080/BBOAJ.2017.02.555594.

**Abstract**

Precision medicine applies molecular technologies and statistical methods to identify biomarkers that indicate differential disease out comes or treatment responses for better matching of disease with specific therapies to optimize treatment assignment. The success of precision medicine lies in the development of biomarker-based treatment selection strategy to identify right patients for the right treatment. Development of treatment selection strategy consists of three steps:

I. Biomarker identification,

II. Subgroup selection, and

III. Clinical utility assessment via subgroup analysis.

Biomarker identification involves fitting an interaction models to identify a set of potential predictive biomarkers from the measured genomic variables. Subgroup selection develops a prediction model based on the biomarkers identified to partition patients into subgroups that are homogeneous with respect to disease outcomes and/or responses to a specific treatment. Clinical utility assessment evaluates accuracy of patient treatment assignment and assesses enhancement of treatment efficacy. Procedures described are illustrated by simulations and analysis of breast cancer dataset.

**Keywords:** Biomarker identification; Interaction test; Subgroup selection; Tailored therapy

**Introduction**

Advances in molecular technology have shifted the development of new drugs towards precision medicine, identifying patient subgroups likely to benefit from a targeted treatment. Today, many cancer treatments are being developed for targeted therapies [1-8], in which only a subpopulation of patients is expected to benefit from the therapy. The term "personalized (precision) medicine", which is commonly referred to the right treatment for the right patient at the right time, has been used to convey the concept of customizing medical therapies to select best treatments tailored to the individual patients. In conventional drug development, it is based on the concept of "one-size-fits all", and assumed that the drug effect is similar for all patients with the particular disease. However, if a drug is only effective for a small proportion of patients, this drug may not be available for those needed patients since drug's approval is based on mean difference between treated and untreated patients based on the entire patient population.

The success of precision medicine lies in the patient treatment-selection strategy to identify patient subgroups for which a particular therapy is beneficial, and for the patients in complementary subgroup the therapy is unnecessary or possibly harmful. Subgroups refer to a subset of patients defined by baseline and/or disease characteristics with respect to a specific clinical endpoint. The baseline and disease characteristics include demographics, genetics variants, phenotypic variables, disease stages, and tumor subtypes [9,10]. These characteristics, referred as "biomarkers," provide indicators of status of an organism of a particular health condition, disease state and susceptibility, or response to a therapy.

Biomarkers for treatment-selection can generally be classified into two major types, prognostic and predictive biomarkers. Prognostic biomarkers are indicators of overall disease, regardless of treatment. The Onco type DX™ 21 gene signature [11] is the well-known breast cancer prognostic test that predicts patients with degree of risk for information to select the appropriate treatment. Predictive biomarkers are indicators of the likelihood of patient's response to a particular treatment. The known predictive biomarkers are: Herceptin treatment for HER2-postive [12] and tamoxifen treatment for ER/PR-positive breast cancers [12, 13], Erlotinib treatment for epidermal growth factor receptor mutation for non-small cell lung carcinomas [13]. Prognostic and predictive biomarkers have been discussed extensively [14]. Notice that prognostic biomarkers and predictive biomarkers are not mutually exclusive.

Development of biomarker-based strategies for treatment decisions can be divided into three components:

I. Biomarker identification,

II. Subgroup selection, and

III. Clinical utility assessment via subgroup analysis.

Biomarker identification involves fitting regression models to identify a set of potential prognostic and/or predictive biomarkers from measured genomic variables. Subgroup selection develops prediction (prognostic and predictive) models based on the biomarkers identified to partition patients into subgroups that are homogeneous with respect to disease outcomes or treatment effects. Prognostic models identify patients as good prognosis (low risk) versus poor prognosis (high risk). The predictive models identify patients who are suitable for the treatment (responders) and who are not suitable (non-responders). In the context of targeted therapy, this article focuses on predictive model to identify responder and nonresponder subgroups. Clinical utility assessment infers that the treatment-selection strategy can classify patients accurately and improve the power to detect treatment effect so that effective drugs are available for the needed the patients to receive the treatment.

**Methods**

**Biomarker identification**

Let m be the number of measurements (Z_{1},..,Z_{m}) investigated and n be the total number of patients in the experiment. In clinical efficacy studies, the measurements are made and collected before the randomization. Thus, the treatment should not have effects on those measurements. For a given patient, let z^{ij} denote the j-th measurement (j = 1, ..., m) in the i-th patient(i = 1, ..., n), and y^{it} denote the clinical outcome of interest for the i-th patient in the t-th treatment. The notation y^{it} is simplified as y^{i} since the index are completely determined by i. The outcome variable can be binary, continuous, and time-to-event onset. When the number of variables is relatively large, the univariate variable-by-variable analysis is commonly used to identify the variables that are associated with the target variable. The conventional regression model for subgroup analysis [15,16] includes the genomic variable z_{ij} and treatment T as main effects and the variable-by-treatment interaction (T*z_{ij} ):

Equation (1) is a generalized linear regression model where h(y_{i}) is a log it function when y is binary, an identify function when y is continuous, and a Cox proportional hazard function in log form when y is a survival endpoint. This article focuses on binary outcomes.

This model is commonly used for subgroup analysis to identify a variable (factor) that shows differential subgroup effects [17-22]. The coefficient b_{3j} measures differential treatment effects in the sampled patients implicated by different value of z_{ij} . A significant b_{3j} implies a significant difference in treatment responses between underlying subgroups (responders and nonresponders) in the variable z_{ij}. Let T denote the set of significant variables z's at a predetermined level; T is the set of candidate predictive biomarkers.

Equation (1) has been known to be lack of power for assessing interaction effects b_{3j}'s. Freidlin and Simon [16] presented an alternative model without the main effect term z_{ij} to identify candidate predictive biomarkers:

A significant interaction coefficient b_{4j} indicates a difference in the outcomes between subgroups due to difference either in underlying disease prognosis or in treatment response in the variable z_{ij}. The set of significant variables, denoted as U, would consist of both prognostic biomarkers (S) and predictive biomarkers (T).

**Subgroup selection**

Subgroup selection is to develop classification strategy to stratify patients into responder and non-responder subgroups based on the biomarkers identified. Classification algorithms depend on the type of target variables. For binary outcome variables, the observed outcomes (positive and negative) can be used as the class labels. The subgroup selection, then, can be regarded as a standard class prediction problem. The commonly used class prediction algorithms in genomic and personalized medicine applications include logistic regression, classification trees and random forests, linear and diagonal discriminant analysis, support vector machines, etc. [23-28]. In this article, we used diagonal linear discriminant analysis (DLDA) [26], since it has been shown to perform well and was robust against imbalanced subgroups sizes [29] even with considerable size difference, a common occurrence in subgroup selection.

**Clinical utility assessment**

Assessment of a biomarker-based predictive model mainly evaluates whether the predictive model fits for its intended context of use. It does not to determine whether individual biomarkers are predictive. It is to determine if the predictive model is useful for treatment selection including:

1. Accuracy of the subgroup selection and

2. Enhancement of treatment efficacy to detect treatment effect on the selected responder patients.

For binary responses, the common performance measures are sensitivity (the proportion of correct identification of responder patients out of total responders), specificity (the proportion of correct identification of non-responder patients out of total non-responders), and accuracy (the total number of correct identifications). For patient treatment assignment, it is desirable that the prediction model should have high sensitivity and high specificity, which implies high accuracy. In confirmatory clinical trials, enhancement of treatment efficacy via subgroup analyses of responders would involve testing two hypotheses. The first hypothesis is a comparison between the treatment and control arms for the whole trial population at α_{1} significance level, the second hypothesis is a comparison on the responder subgroup at α_{1} significance level, where at α_{1} + α_{2} = α, the overall family wise error rate.

**Simulation study**

**Biomarker identification:** The simulation design considered a two-arm experiment with a sample size of 600 patients, where 300 patients were randomly assigned to each arm. Two thousand covariates were generated from a normal distribution. Among them, there were 10 prognostic biomarkers, 10 predictive biomarkers, and 5 predictive and prognostic biomarkers. These 25 biomarkers were generated from N(1,0.2^{2} ), the remaining 1975 covariates were generated from N(0,0.2^{2}). One thousand pair of training and test sets was simulated; the training dataset was used to develop the procedure and the test dataset was used for evaluation.

In the simulation design, the proportion of responders π = 0.2. The expected numbers of responder and non-responder subgroups are 60 and240, respectively. The target variable was binary with "positive" or "negative" response. The probability of a positive outcome p for each subgroup was generated by the logit model:

With

where

n_{pred} = n_{prog} = 15

Thus, the model for generating the LR patients in the SOC group was y = β_{0} + β_{3} * *prognostic* and for the HR patients in the TRT group was *y = β _{0}* +

*β*. The models for generating other subgroups were similar. For the SOC group, the expected probabilities of positive outcome for the responder and nonresponder subgroup were 0.436, for the treatment group, the expected probabilities were 0.754 and 0.436, respectively. The expected probabilities of positive outcomes are 0.436 and 0.50 for the SOC and TRT groups, respectively (Table 1). The expected power for the treatment effect is 0.344 at α= 0.05.

_{1}*τ+β_{2}*τ*predictiveEach simulated dataset was fit to the two regression models, Eqs. (1) and (2). Table 2 shows the total number of identifications (significances) and the number of correct identifications for the biomarker sets T and U at α= 0.005 and 0.001.The model for the numbers for T and U were 15, and 25, respectively. The row for the correct identifications in U included the numbers of prognostic and predictive biomarkers correctly identified. U identified more predictive biomarkers than T.

Since the specificities were high in all cases, the analyses focused on the sensitivity. For the significance levels between α= 0.005 and 0.001, the proportions of correct identifications were higher for α= 0.005; however, the proportions of true identifications were higher for α= 0.001. An explanation is that for 2,000 tests, the expected number of false positives is 10 at α= 0.005 and 2 at α = 0.001. The sensitivities were poor in T, about 40%. T identified more false positives than true positives. The analyses below will only focus on α= 0.005 since the results are similar.

Subgroup selection: Both T and in U were used to develop the predictive classifiers C(T) and C(U), respectively. Table 3 shows the sensitivity, specificity, and accuracy for the two classifiers. The classification results for the SOC and TRT groups are very similar since the calculations were based on the test data simulated from the same model. The classifier C(U) shows good sensitivity and poor specificity due to more true positives and more false positives. Thus, when there is a treatment for all patients, the non-responder patients are likely to be classified as responders. Table 4 shows the total number of patients identified and correct number of identifications. It appears that the classifier C(T) outperformed the classifier C(U); C(U) showed too many false identifications resulting in poor specificity.

Clinical utility assessment: The expected probabilities of positive outcomes were 0.436 and 0.50 for the SOC and TRT groups, respectively. The power for detecting a treatment effect is 0.344. The probabilities of positive outcome in the responder subgroup for SOC and TRT were 0.436 and 0.754, respectively; the expected power to a detect treatment in the responder subgroup was 0.953.For the simulated data, Table 4 showed that the estimated empirical power with C(T) and C(U) were 0.513 and 0.413, respectively; both probabilities are higher than the study power 0.344. In subgroup selection, empirical power depends on subgroup sizes, effect size, and the accuracy of classification. The estimated empirical power is generally smaller than the model theoretical value since there was much false identification, partly due to random variation.

• Example

Prat et al. [30] reported an exploratory analysis of the research-based PAM50 signature to predict a response to the trastuzumab chemotherapy among breast cancer patients enrolled in the NeO Adjuvant Herceptin (NOAH) trial. The data are available from the GEO database (GSE50948). Their analysis considered 43 genes, since 7 of 50 genes did not meet the quality standard. We analyzed this dataset to illustrate an application of the proposed method. This analysis does not necessarily represent the true categorization of the patients and biomarkers. We considered only HER2+ patients in two experimental groups. The numbers of patients with and without trastuzumab treatment were 63 and 51, respectively; the corresponding observed pathologic complete responses (pCR) were 28 and 13.

Four of the 43 genes were identified as predictive biomarkers (PTTG1, FOXA1, MKI67, RRM2) by Eq. 1, and five prognostic/ predictive biomarkers (ACTR3B, RRM2, BIRC5, KRT17, MELK) by Eq. 3. Table 5 shows the numbers of patients and means of the observed outcomes in the four subgroups identified. In this dataset, the p-value for the overall test between the treatment groups was 0.058. The p-values are 0.063 and 0.051 for C(T) and C(U), respectively. The p-value from C(U) is slightly smaller that the p-value from the overall test.

**Discussion**

This article focuses on development of predictive biomarker- based predictive models. Two interaction models (Eq. 1 and Eq. 2) are evaluated; both models have been used to identify candidate predictive biomarkers and developed predictive classifiers C(T) and C(U), respectively. This is the first article pointing out that Eq. 2 identifies both predictive and prognostic biomarkers. The simulation shows the predictive classifier C(T) outperformed the classifier C(U). Chen et al. recently evaluated C(T) and C(U) for survival outcomes, they found that C(U) slightly outperformed C(T) in their simulations. As mentioned, accuracy of a subgroup selection procedure depends on sample size, subgroup sizes, treatment effect size, significance level, most importantly, the underlying disease and biology models. A future study to compare these two models thoroughly in terms of power and type I error in different scenarios would be helpful.

Lin & Chen [31] compared the three popular classification algorithms, RF (random forests), SVM (support vector machines), and DLDA. They showed that RF and SVM performed poorly when the class sizes differ considerably, and DLDA performed well. We, therefore, considered the DLDA classification algorithm, primarily due to imbalanced subgroup sizes, that is, many more non-responders than the responders. DLDA performs well because the decision for its boundary is based on the sample means and variances of the two subgroups, which are independent of the two subgroup sizes. More detailed discussions regarding classification of imbalanced data are given in Lin and Chen [29].

We considered only binary outcome. Subgroup selection for non-binary outcomes generally involves two steps once the candidate biomarkers have been selected. The first step is to develop mathematical models, such as Cox regression, to assign patients' predictive scores based on the biomarkers et identified. The second step is to use appropriate statistical methods to find a cutoff-point for the score and divide the patients into subgroups. For example, Li et al. [32] presented a grid search to choose the optimal cutoff that maximizes a test statistic to identify responders and non-responders. Another common approach is using classification/regression trees to partition patients into subsets of homogeneous groups [33-35]. The tree-based methods build a tree structure by simultaneously performing biomarker identification and subgroup selection in a single step.

Disease biology is complicated; the underlying genomic variables and patient population consists of several components representing different population subgroups. It is helpful to determine whether there are subgroups prior to conducting subgroup selection. Chen & Chen [36] proposed applying the likelihood ratio test (LRT) [37,38], based on the biomarkers identified, to analyze homogeneity among the sampled patients. The LRT considered the alternative model as a two-component mixture model, which may besuboptimum. We recommend that subgroup selection be conducted only when there are candidate biomarkers and the LRT is significant.

There are challenges in developing a classification model to identify patient subgroups where the genomic and target variables are random variables of observed experimental outcomes. For the binary variable considered in this article, the observed positive and negative outcomes were used as class labels to develop a binary classifier. However, positive outcomes may be non-responders, while negative outcomes may be responders. That is, some sample classes were mislabeled. Similarly, for survival outcomes there are censored observations, and long-time survival non-responders and short-time survival responders. These observed data are outliers with respect to the underlying subgroups, and the predictive models developed will be biased. Thus, when the target variable is a random variable, the developed prediction model will be prone to misclassification and bias.

**Financial & competing interest’s disclosure**

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript.

**Acknowledgment**

Un Jung Lee’s work was fully funded by Oak Ridge Institute for Science and Education (ORISE). The views presented in this paper are those of the authors and do not necessarily represent those of the U.S. Food and Drug Administration.

**References**

- Balis FM (2002) Evolution of anticancer drug discovery and the role of cell-based screening. J Natl Cancer Inst 94(2): 78-79.
- Schilsky RL (2002) End points in cancer clinical trials and the drug approval process. Clin Cancer Res 8(4): 935-938.
- Rothenberg ML, Carbone DR, Johnson DH (2003) Improving the evaluation of new cancer treatments: challenges and opportunities. Nat Rev Cancer 3(4): 303-309.
- Floyd E, McShane TM (2004) Development and use of biomarkers in oncology drug development. Toxicol Pathol 32(1): 106-615.
- Amur S, Frueh FW, Lesko LJ, Huang SM (2008) Integration and use of biomarkers in drug development, regulation and clinical practice: a US regulatory perspective. Biomark Med 2(3): 305-311.
- Simon R (2010) Clinical trials for predictive medicine: new challenges and paradigms. Clin Trials 7(5): 516-524.
- Beckman RA, Clark J, Chen C (2011) Integrating predictive biomarkers and classifiers into oncology clinical development programmes. Nat Rev Drug Discov 10(10): 735-748.
- Buyse M, Michiels S, Sargent DJ, Grothey A, Matheson A, et al. (2011) Integrating biomarkers in clinical trials. Expert Rev Mol Diagn 11(12): 171-182.
- 2010. Food, Administration D: Qualification process for drug development tools. Fed Regist 75: 65495-65496.
- Chen JJ, Lu TP, Chen YC, Lin WJ (2015) Predictive biomarkers for treatment selection: statistical considerations. Biomark Med 9(11): 1121-1135.
- Paik S, Shak S, Tang G, Kim C, Baker J, et al. (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351(27): 2817-2826.
- Vogel CL, Cobleigh MA, Tripathy D, Gutheil JC, Harris LN, et al. (2002) Efficacy and safety of trastuzumab as a single agent in first-line treatment of HER2-overexpressing metastatic breast cancer. J Clin Oncol 20(3): 719-726.
- Kobayashi K, Hagiwara K (2013) Epidermal growth factor receptor (EGFR) mutation and personalized therapy in advanced nonsmall cell lung cancer (NSCLC). Target Oncol 8(1): 27-33.
- Chen JJ, Lu TP, Chen DT, Wang SJ (2014) Biomarker adaptive designs in clinical trials. Translational Cancer Research 3(3): 279-292.
- Freidlin B, Jiang W, Simon R (2010) The cross-validated adaptive signature design. Clin Cancer Res 16(2): 691-698.
- Freidlin B, Simon R (2005) Adaptive signature design: an adaptive clinical trial design for generating and prospectively testing a gene expression signature for sensitive patients. Clin Cancer Res 11(21): 7872-7878.
- Yusuf S, Wittes J, Probstfield J (1991) Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 266: 93-98.
- Assmann SF, Pocock SJ, Enos LE, Kasten LE (2000) Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 355(9209): 1064-1069.
- Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, et al. (2004) Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol 57(3): 229-236.
- Song Y, Chi GY (2007) A method for testing a prespecified subgroup in clinical trials. Stat Med 26(19): 3535-3549.
- Millen BA, Dmitrienko A, Ruberg S, (2012) A Statistical Framework for Decision Making in Confirmatory Multi population Tailoring Clinical Trials. Drug Information Journal 46: 647-656.
- Wang SJ, Hung HMJ (2014) A Regulatory Perspective on Essential Considerations in Design and Analysis of Subgroups When Correctly Classified. J Biopharm Stat 24(1): 19-41.
- Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer
- Brieman L, Friedman J, Olshen R, (1995) CART: Classification and Regression Trees, Stanford, USA.
- Breiman L (2001) Random Forests. Machine Learning 45:5-32.
- Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77-87.
- Vapnik V (1995) The Nature of Statistical Learning Theory. New York, Springer
- Guyon I, Weston J, Barnhill S, (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46:389-422.
- Lin WJ, Chen JJ (2013) Class-imbalanced classifiers for highdimensional data. Brief Bioinform 14(1): 13-26.
- Prat A, Bianchini G, Thomas M, Belousov A, Cheang MC, et al. (2014) Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin Cancer Res 20(2): 511-521.
- Lin WJ, Chen JJ (2012) Biomarker classifiers for identifying susceptible subpopulations for treatment decisions. Pharmacogenomics 13(2): 147-157.
- Li L, Guennel T, Marshall S, Cheung LW (2014) A multi-marker molecular signature approach for treatment-specific subgroup identification with survival outcomes. Pharmacogenomics J 14(5): 439-445.
- Segal MR (1988) Regression Trees for Censored-Data. Biometrics 44: 35-47.
- Davis RB, Anderson JR (1989) Exponential Survival Trees. Stat Med 8(8): 947-961.
- Ciampi A, Thiffault J, Nakache JP (1986) Stratification by Stepwise Regression, Correspondence-Analysis and Recursive Partition - a Comparison of 3 Methods of Analysis for Survival-Data with Covariates. Comput Stat Data Anal 4: 185-204.
- Chen YC, Lee UJ, Tsai CA, Chen JJ (2016) Development of predictive signature for treatment selection in precision medicine with survival outcomes. Phar Stat.
- Mclachlan GJ (1987) On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture. J R Stat SocSer C Appl Stat 36: 318-324.
- McLachlan GJ, Rathnayake S (2014) On the number of components in a Gaussian mixture model. Wiley Inter discip Rev Data Min Knowl Discov 4(5): 341-355.