Keywords: Surrogate endpoints; Biomarkers; Composite endpoints; Off-targets disease process; Clinically meaning full outcome; Benefit/risk ratio; Substitute that correlate with clinical efficacy measure; Affected in parallel with the disease
Abbreviations: TTP: Time To Progression; TR: Tumour Response Rate; PFS: Progression-Free Survival; FDA: Food and Drug Administration; EMA: European Medical Agency; MACE: Major Cardiovascular Event; ESAs: Erythropoiesis Stimulating Agents
A surrogate endpoint has been defined as 'a biomarker intended to substitute for a clinical endpoint', the latter being 'a characteristic or variable that reflects how a patient feels, functions, or survives' in response to a treatment [1-5]. A surrogate endpoint could also be defined as 'a characteristic that objectively measure and evaluate the normal biologic processes, the pathogenic processes, or the pharmacologic responses to a therapeutic intervention [1-5]. Again, the surrogate endpoint could be further subdivided according to whether it is being used for diagnosis, staging, or for monitoring of disease progression [1-5]. Finally in clinical trials a surrogate endpoint is a measure of effect that may correlate with a real clinical endpoint but does not necessarily have a proved relationship [1-6]. For instance, in cancer, surrogate endpoints such as: progression-free survival [PFS]; time to progression [TTP]; tumour response rate [TR] and others could also be used to substitute for a clinical endpoint when the primary endpoint is less desirable (overall survival; deathj, or when the number of patients is very small, thus making it impractical to conduct a phase III RCT to gather a statistically significant number of endpoints [1-6].
Regulatory agencies will often accept evidence from clinical trials that show a direct clinical benefit from surrogate endpoints. They may be used instead of stronger indicators, such as overall survival or improved quality of life, because the results of the trial can be measured sooner [1-6]. The use of surrogate endpoints in clinical trials may allow earlier approval by regulatory agencies of new drugs to treat serious or life- threatening diseases, such as cancer [1-6]. Drs. Archambault and Plourde have recently published an article discussing the use of biomarker for the screening, diagnostic and for measuring the response to treatment in patients suffering from advanced prostate cancer . Because surrogate endpoints followed a comparable principle of validation than biomarkers, in this mini review I will rather discuss the validation of surrogate endpoints.
However, before a surrogate endpoint can be used in clinical trials, one of the main concerns is the choice of the primary outcome measures, therefore this surrogate endpoint should be a strong substitute for this primary outcome measures . This selection has considerable impact on the reliability and interpretability of clinical trials designed to evaluate the benefit/ risk ratio of a new health product for a specific indication (see belowj. These surrogate endpoint scan also include physical signs of an illness, laboratory measures biochemical biomarkers, radiological tests such as MRI, PET, shrinking tumour and othersare often considered as replacement endpoints or "surrogates" [1-6]. In this mini review, I will discuss some of the main characteristics for the validation of surrogate endpoints, and where necessary, I will provide examples from well-known RCTs to better illustrate the concept discussed in the paragraph. As you know, in some occasions Health Canada, the Food and Drug Administration (FDA), the European Medical Agency (EMA) and other regulatory agencies may approve new products for marketing based on surrogate endpoint data. However, when products are approved on this basis they may not have a true appreciation of the benefit/risk ratio.
Because RCTs using this type of outcome typically report larger treatment effects than trials reporting final patient outcomes this can increase the benefit part of the benefit/risk
ratio assessment of a new health product in development in depend of the safety concerns [1-6]. For instance, there are a number of examples where drugs were approved on the basis of surrogate endpoints that were later removed from the market or have their prescribing significantly restricted in their product monographs because of safety concerns. For instance, cerivastatin (Baycol) used for the treatment of hyper lipidemia can caused fatal cases of rhabdomyolysis; rosiglitazone (Avandia) used for the treatment of type II diabetes mellitus can increase the risk of myocardial infarction and; finally, flecainide (Tambocor) used for the treatment of cardiac arrhythmias can increased cardiac mortality . These examples demonstrate that selecting surrogate endpoints is not always for the best interests of the patients. There are many potential roles for surrogate endpoint in clinical research [1-6]. For instance, the surrogate endpoint can be used to effectively achieve the suited objective even if it is not on a pathway through which the disease process causally induces morbidity or mortality [1-6]. Surrogate endpoints might also be useful in providing information about whether a treatment has a detectable effect on a specific biological pathway [1-6]. Therefore, they might serve as endpoints in a proof-of- concept trial or as supportive measures in a Phase III RCT .
Other advantages of using surrogate endpoints are related to the fact that they are often cheaper and easier to measure than 'true' clinical endpoints [1-6]. For example, it is easier to measure a patient's blood pressure than to use echocardiography to measure left ventricular function, and it is much easier to do echocardiography than to measure morbidity and mortality from hypertension in the long term as reported by Reboldi G . In clinical trials the use of surrogate endpoints leads to smaller sample sizes [1-6]. For example, to determine the effect of a new drug on blood pressure a relatively small sample size of approximately 100-200 patients would be needed and the trial would be relatively quick (1-2 years). On the other hand, to study the prevention of deaths from strokes, a much larger study group would be needed and the trial would take many years [1-6]. There may also be ethical problems associated with measuring the true clinical endpoints. For example Zytiga is a second-line hormone therapy in the treatment of advanced prostate cancer but this drug is known to be hepatotoxic . It is unethical to wait for evidence of liver damage before deciding whether or not to treat a patient; to reduce the dose; or to discontinue the medication; instead a surrogate endpoint such as measuring the level of liver enzymes could be used to make the appropriate clinical practice decision faster and safer.
As explained in the introduction selecting the appropriate surrogate endpoints that reflect the primary outcome measures is not an easy task [1-6]. It is hoped that the following discussion and examples will be useful to validate surrogate endpoints for your own research.
To enhance the information obtained from RCTs regarding the benefit/risk ratio of a new treatment, the surrogate endpoints should be well defined and be a reliable measures that assess the response to that treatment [1-6]. For example, suppose an oncologic drug (Zytiga) that is being evaluated for the management of bone pains known to be associated with advanced prostate cancer. Measuring bone pain relief or the time to initiate or increase analgesic use as demonstrated in the LATITUDE Trial (2017)  would be sensitive but probably less sensitive and specific than measuring the number and format of bone metastasis which are the main indicators of bone pains in these patients. Therefore this characteristic of sensitivity, specificity and reliability usually plays a dominant role in the selection of appropriate surrogate endpoints.
Another consideration in the validation of surrogate endpoints for clinical trials should be that it is easily measurable and interpretable [1-6]. If an invasive procedure such as prostate biopsies is used to assess the effects on histological measurements from a treatment for prostate cancer, then even if this procedure is highly sensitive and specific, the challenges in measuring these outcomes may induce a high risk for missing data because this procedure is highly invasive and dependent on patient motivation. These missing data could cause substantial bias and an important reduction in interpretability of the study results [1-6]. Interpretability also might be reduced when composite surrogate endpoints are used [1-6]. Composites surrogate endpoints are often considered to increase the trial's sensitivity or the statistical power by increasing the number of patients experiencing the primary endpoint [1-6]. However, the interpretability of such surrogate endpoints is greatly influenced by whether each component of the composite has similar clinical relevance to the other components [1-6]. For instance, in the study by Kip Ke  for the Major Cardiovascular Event (MACE) study contained composite endpoint, i.e., the composite of "cardiovascular death, stroke or myocardial infarction" . This is interpretable in clinical trials in patients with acute coronary syndrome because each component of the composite is a measure of irreversible morbidity or mortality . However, the interpretability of such a measure can be substantially reduced when the components "acute coronary syndrome, received cardiac interventions including coronary artery bypass graft or percutaneous coronary intervention, leg amputation, or revascularization in the leg". Therefore, the interpretability of the MACE endpoints was significantly compromised when "asymptomatic distal deep venous thrombosis" was added to the composite surrogate endpoints .
As discussed in the article by Fleming TR and DeMets DL (1996), the main characteristic in guiding the selection of the surrogate endpoint in RCTs is whether or not the effect observed with the surrogate end point provides reliable evidence about the benefit of a treatment for the patients. In this case, the surrogate endpoint measure in RCTs should be "a clinical event relevant to the patient" , or an endpoint that "measures directly how a patient feels, functions (patients' ability to perform activities in their daily lives) or survives" in response to this treatment [1-6,13]. Such an outcome measure is hereafter referred to as 'clinically meaningful endpoints' or 'clinical efficacy measures' [1-6,13].
Many outcome measures used in clinical research are not clinically meaningful endpoints, but are indirect measures that are used as surrogate endpoints [1-6]. Validating a surrogate endpoint requires providing evidence based justifications, often from RCTs, that achievement effects from the surrogate endpoint reliably predicts achievement of clinically important effects on a clinically meaningful endpoint [1-6]. A good example is cholesterol. For instance elevated cholesterol levels increase the likelihood for heart disease. However, the relationship is not linear; many people with normal cholesterol develop heart disease, and many with high cholesterol do not. «Death from heart disease» is the endpoint of interest, but «cholesterol» is the surrogate endpoint . Some indirect measures that are considered as potential surrogate endpoints, such as 6 minute walk test, pulmonary function tests and others, may be dependent on patient motivation. However, most surrogate endpoints do not have such dependence. According to the Institute of Medicine (2010)  the surrogate endpoints are measurements of biological processes and "include physiological measurements, blood tests and other chemical analyses of tissue or bodily fluids, genetic or metabolic data, or measurements from images" .Changes induced by a therapy on a surrogate endpoint are expected to reflect changes in a clinically meaningful endpoint [1-6,14].
Suppose it is of interest to use a surrogate endpoint as a substitute endpoint in a Phase IIIRCT that is aimed to provide reliable evidence about efficacy and safety of a new treatment. The acceptance of a specific surrogate end point came from the fact that it should measure and reliably predict the effects on a clinical efficacy measure; in other words that the response observed with a specific surrogate endpoint should strongly correlate with the natural history of the disease [1-6]. For example, in oncology, we should question whether responders (i.e., patients who experience substantial tumour shrinkage following therapy) live longer than non-responders correlate with the improvement in overall survival [1-6]. We should also question whether or not the longer survival duration in responders causally induced by the anti-tumour effects of the treatment, or did the treatment-induced tumour response was simply observed in patients who lived longer because of their better baseline health status ? We should understand that it is a common misconception that if an outcome correlates with the true clinical outcome it can be used as a valid surrogate endpoint (that is, a replacement for the true clinical outcome) . However, proper justification for such replacement requires that the effect of the intervention on the surrogate endpoint predicts the effect on the clinical outcome; a much stronger condition than correlation . While the effect of a treatment on a surrogate endpoint does provide direct evidence regarding biological activity, such evidence could be unreliable regarding effects on the true clinical efficacy measures [1-6].
Even surrogate endpoints that are strongly correlated with clinical efficacy measures in the natural history of the disease, but they are not in the causal pathway of the disease process, are likely to provide misleading information about the clinical efficacy [1-6]. For example, while the risk that HIV-infected pregnant women will transmit the infection to their infants is strongly correlated with maternal CD4 counts, a treatment with interleukin-2 given late in pregnancy to spike maternal CD4 counts would not impact this transmission risk. This correlation between maternal CD4 and risk of mother-to-child transmission of HIV exists because both measures are influenced by the maternal viral load . More reliable facts about potential effects on mother-to-child transmission of HIV would be obtained by assessing whether a treatment provides large reductions in maternal viral load, and whether or not these reductions are sustained during pregnancy, labour and delivery, and during breastfeeding. Of course, the preferred approach would be to assess the effect of the treatment directly on the outcome by measuring the proportion of infants infected with HIV .
In oncology, tumour markers such as prostate specific antigen (PSA) and carcino embryonic antigen (CEA) are correlated with clinical efficacy measures, such as cancer symptoms and death. These correlations are sufficient to allow these measures to be useful for assessing prognosis in patients receiving treatment for their cancer. However, the effects on CEA and PSA would likely provide unreliable information about clinical efficacy since it is the tumour burden process, rather than levels of CEA or PSA, that is a true causal mechanism for the induced morbidity and mortality in response to cancer .
A second factor complicating the reliability of an evaluation of efficacy based on surrogate endpoints is the multidimensionality of the causal mechanisms of the disease process. The risk of false negative conclusions (is whenyou get a negative test result, but you should have got a positive test result) about clinical efficacy can be increased if the surrogate end point is not totally related to the disease process pathway that is impacted for the treatment [1-6]. According to the International Chronic Granulomatous Disease Cooperative Group Study (1991), in a trial for chronic granulomatous disease, the interferon-y provided a statistically and clinically significant 70% reduction in the rate of recurrent serious infections . However, this health product did not have a detectable effect on the surrogate endpoint of bacterial killing and superoxide production .
False positive conclusions (false positive is where you receive a positive result for a treatment, when you should have received a negative result) about clinical efficacy could arise if a surrogate endpoint captures the substantial effects of a treatment on one causal pathway of the disease process, while the treatment has an inadequate impact on other causal pathways of the disease process [1-6]. Consider, for example, the three arm Sweden I Acellular Pertussis trial of Gustafsson L
 where all children received vaccines having diphtheria and tetanus components, along with the addition of a Smith- Kline Beecham or an Aventis Pasteur acellular pertussis component or a placebo . Relative to the diphtheria +tetanus+ placebo control arm, the Aventis Pasteur vaccine provided an 85% reduction, (95% Confidence Interval [CI[ of 81% to 89%), in the rate of pertussis cases, while the Smith-Kline Beecham vaccine provided only a 58% reduction, (95% CI of 51% to 66%). When comparing these two vaccines having active a cellular pertussis components, even though the Aventis Pasteur vaccine had strongly superior vaccine efficacy, the Smith-Kline Beecham vaccine had superior effect on two leading biomarkers of Filamentous Haemagglutinin and Pertussis Toxoid antibody responses. The misleading information provided by these two antibody surrogate endpoints regarding relative efficacy of these a cellular pertussis vaccines might be explained by differences between vaccines in durability of their antibody responses, yet more likely is explained by additional immune responses generated by the Pertactin and Fimbrae (types 2 and 3) antigens in the Aventis Pasteur vaccine.
Even when the biomarker does capture effects on the principal causal pathway of the disease process, it is often unclear what magnitude and duration of effect on that pathway is required to meaningfully affect the clinical efficacy measure. For example, consider the evaluation of coronary thrombolytic to speed reperfusion of infarct-related coronary arteries, and in turn to decrease 30-day mortality post myocardial infarction. In this setting, the Phase 2b RAPID II trial by Smalling RW
 provided evidence that the experimental health product Reteplase, (Recombinant Plasminogen Activator, r-PA), provided better effects than Alteplase (Recombinant Tissue Plasminogen Activator, t-PA), in achieving "patency", i.e., TIMI-III blood flow rates at 60 minutes (51% versus 37%) and at 90 minutes (60% versus 45%) post randomization . Based on these positive surrogate endpoints results for r-PA, it was somewhat surprising that 30-day mortality was numerically higher on r-PA than t-PA (i.e., 7.43% versus 7.22%) in the 15,000-patient GUSTO-III confirmatory trial . However, a revaluation of the RAPID-II trial revealed that TIMI-III blood flow rates at 30 minutes were lower on r-PA than on t-PA (i.e., 27% versus 39%). This means that the lack of knowledge about the magnitude and duration of effect on a pathway of the disease process that is required to achieve a given effect size on a clinically meaningful endpoint disturb the reliability and interpretability of any trial that use surrogate endpoints .
Another factor complicating the reliability of an evaluation of efficacy based on surrogate endpoint is the likelihood that these measures do not capture important off target effects of the treatment, even though such effects could meaningfully alter the true clinical efficacy of the intervention . To illustrate this concept we can use the ACCORD trial in type 2 diabetes mellitus which revealed as being a very good therapeutic strategy by providing an additional absolute 1% reduction in HbA1c . On the other hand the same trial demonstrated that this positive effect resulted in an increase in mortality through off target effects by inducing a higher risk of hypoglycemia .
Given the substantial risk that effects on surrogate endpoints can provide misleading information about the true effect of a treatment on the clinical efficacy measures, it is important to consider the nature of the scientific evidence that would allow one to use surrogate endpoints in place of clinically meaningful endpoints in RCTs [1-6,13]. For instance, the Normal Hematocrit Trial , conducted in more than 1000 patients with end stage renal disease, illustrates the concept that even if a strong correlation between a surrogate endpoint (i.e., hematocrit) and clinical efficacy measures (i.e., death and myocardial infarction-free survival), was observed on the "standard of care" control regimen (i.e., standard dose Epogen) and maintained on the experimental regimen (i.e., high dose Epogen). In this trial, through the off-target effects including increased risk of thrombosis not captured by the surrogate endpoint, use of the experimental high dose Epogenregimen resulted in a net 30% increase in the rate of death or myocardial infarction . This suggests that a favourable effect on the surrogate endpoint still can be misleading about the net effect of the treatment on the clinical efficacy measure [1-6,13].
Ventricular arrhythmias cause sudden death, and anti arrhythmic drugs prevent ventricular arrhythmias. It was therefore hypothesized that anti arrhythmic drugs would prevent sudden death. In fact, in the Cardiac Arrhythmia Suppression Trial , Class I anti arrhythmic drugs increased sudden death significantly in patients with asymptomatic ventricular arrhythmias after a myocardial infarction, and the trial was stopped prematurely. This suggests that the hypothesis was wrong . Another good example is the combination enalapril and vasodilators, such as hydralazine and isosorbide, whose haemodynamic effects and effects on mortality associated with heart failure, are dissociated. Vasodilators improved exercise capacity and improved left ventricular function to a greater extent than enalapril. However, enalapril reduced mortality significantly more than vasodilators . So in this case haemodynamic effects are not a good surrogate .
Finally, patients with asthma feel breathless if they have a low peak expiratory flow rate (PEFR). However, in one study different drugs produced different relationships between PEFR and breathlessness [2,22]. Patients taking become thasone did not feel as breathless as those taking the ophylline for a given PEFR. So what should be the surrogate marker; the 'hard' endpoint of peak flower the 'soft' marker of how the patients felt? This also raises the question of whether more than one surrogate endpoint should be used in clinical trials for a specific primary outcome . Confounding factors can nullify the value of surrogate endpoints. The most reliable evidence regarding the validity of a surrogate endpoint for a clinical efficacy measure might be provided by a systematic review of RCTs that give reliable estimates of the net effects of a treatment on the clinically meaningful endpoint as well as on the surrogate endpoint [1-6].
Other useful surrogate endpoints are not directly related to the clinical endpoint, but are affected in parallel with the disease. In some cases they are good diagnostic markers but not good markers of progress (for example, prostate specific antigen in prostatic cancer), or conversely they may be good markers of progress but not helpful diagnostically (for example carcino embryonic antigen inovarian carcinoma) .
There is considerable interest in identifying a subset of the patient population for whom a treatment would have a clinically meaningfully favourable benefit/risk ratio due to greater benefits or fewer adverse outcomes. Being able to define this targeted population can avoid diluting the benefit/risk ratio of a treatment, both in clinical research as well as in clinical practice. For example, the effect of trastuzumab in breast cancer patients appears to be specific to the level of her-2-neu over-expression  and the level of effect of epidermal growth factor receptor- inhibiting drugs in colorectal cancer patients appears to depend upon whether tumours express the wild type or the mutated version of the KRAS gene . As seen, the use of surrogate endpoints to determine whether the patients are most likely to receive clinically important benefits from a treatment might be very helpful. Looking for this information on an individual basis can be qualified as personalized medicine. However, we should carefully consider the consequences of relying on surrogate endpoints the primary source of efficacy information when determining whether interventions should be used in clinical practice [1-6]. Such reliance has the benefit of allowing clinical trials for regulatory approval to be smaller in size and shorter in duration. However, an unfortunate consequence is that this leads not only to more limited information about efficacy but also to less reliable assurances about safety given the smaller safety dataset upon which the assessments of the benefit/risk ratio are based [1-6].
It should not be surprising, then, that health products receiving regulatory approval using efficacy assessments based on surrogate endpoints are more vulnerable to having clinically unacceptable safety issues discovered during the post-marketing period. For instance, in type-2 diabetes mellitus, rosiglitazone was approved based on reducing levels of HbA1c, yet clinical trials results that were evaluated in the post-marketing setting provided substantial evidence that this health product increases risks of cardiovascular morbidity and mortality [25,26]. The simvastatin/ezetimibe combination (Vytorin) was approved based on lowering low-density lipoprotein cholesterol as surrogate endpoint, but data from 3 large post-marketing trials suggest it has harmful effects on risk of cancer-related mortality [27-29]. Erythropoiesis stimulating agents (ESAs) received regulatory approval for use in the clinical settings of end stage renal disease, based on the short term effects on increasing the levels of the surrogate endpoint named hematocrit, and reducing the need for blood transfusions. However, subsequent trials provided strong evidence of harmful effects of ESAs on thrombosis, stroke, mortality and possibly malignancy .
Another concern is that the surrogate endpoint based approach provides an increased likelihood that safety signals will not be discovered until post-marketing studies . The assessment of the benefit/risk ratio is particularly challenging when benefit measures are based on surrogate endpoints rather than when the risk is based on clinically meaningful measures of major morbidity [1-6]. For instance, natalizumab was granted an FDA accelerated approval for biologics, based on evidence from short term trials in multiple sclerosis patients that evaluated effects on short term relapse rates . But the sponsor did not provide direct evidence about the effects of the natalizumab on the clinically much more important endpoint such as on progressive multifocal leukoencephalopathy  that greatly negatively influenced the benefit/risk ratio of this product. This product was intermittently withdrawn from the market because of this new identified serious risk. Since reliance on surrogate endpoints lead to having less reliable information about risks of rare but clinically important safety events or about longer term safety and efficacy, the use as surrogate endpoints should be considered only when there is substantial evidence to establish their reliability in predicting effects on clinical efficacy measures and where there use could offer added safety benefits over existing therapies [1-6].
The Institute of Medicine of the National Academies of Science released a major report discussing an array of useful roles for surrogate endpoint and why rigor is important regarding their proper use . In particular, this report from the IOM recommends the evaluation process for using surrogate endpoints that consists of 3 steps: (a) Analytical validation, which includes an analysis of the analytical performance of an assay used in formulating the surrogate endpoint; (b) Qualification, which includes assessing available information regarding the relationship of effects on surrogate endpoint and effects on clinical efficacy measures; and (c) Utilization, which includes determining whether the validation and qualification provide sufficient support for use of a surrogate endpoint in the context proposed [1,14]. According to Fleming TR. and Powers JH (2012) , the "validity of surrogacy" for evaluating clinical efficacy cannot be extrapolated to another treatment in that clinical setting if the interventions differ (a) in the magnitude and duration of their effects on the causal pathway of the disease process that is captured by the surrogate endpoint, or (b) in how they affect causal pathways of the disease process not captured by the surrogate endpoint, or (c) in their off-target effects. Furthermore, the "validity of surrogacy" for evaluating the effect of a specific treatment in one clinical setting cannot be assumed to hold in another clinical setting if there are differences across settings in either the on-target or the off-target effects of the treatment .
In conclusion, the ideal surrogate endpoint is one through which the disease comes about or through which an intervention alters the disease [1-6]. For example, the serum cholesterol concentration should be an excellent diagnostic surrogate endpoint for cardiovascular disease; however, there is no clear cut-off point, and only about 10% of those who are going to have a stroke or heart attack have a serum cholesterol concentration above the reference range. But even if cholesterol is not a good diagnostic surrogate, it can still be used as a surrogate endpoint of therapeutic response to cholesterol lowering drugs . Using surrogate endpoints is often motivated by interests to reduce the size and duration of RCTs, with the hope that this will allow more timely evaluation of the benefit/risk ratio of experimental interventions, and will permit to improve the ability to offer health care providers another choice in their clinical care [1-6]. However, a rigorous evidence-based justification should be provided in any setting where the use of surrogate endpoints is proposed because the scientific evaluation of benefit/risk ratio needs to be valid and reliable as well as timely [1-6]. There are clear potential benefits in using surrogate endpoints. Information can be obtained earlier, more quickly, and more cheaply [1-6]. However, the chain of events in a disease process linking pathogenesis to outcome is fragile and the better we understand the nature of the path a disease takes and the pharmacology of a drug that affects this pathogenesis; the better the surrogate endpoint we will be able to develop for diagnosing, staging, and monitoring disease and the response to therapy [1-6].