Biostatistics and Biometrics Open Access Journal

Risk Based Monitoring in Clinical Trial: An Application with Neural Networking

Atanu B¹, Savanur S² and Suman K³*

¹Centre for Cancer Epidemiology, Tata Memorial Centre, India

²Department of Biometrics, Chiltern Clinical Research Ltd, India

³Department of Biostatistics, Quintiles IMS,India

Submission: September 18, 2017; Published: November 10, 2017

*Corresponding author: Suman Kapoor, Department of Biostatistics, Quintiles IMS, India; Email: kapoor.suman@gmail.com

How to cite this article: Atanu B, Savanur S, Suman K. Risk Based Monitoring in Clinical Trial: An Application with Neural Networking. Biostat Biometrics Open Acc J. 2017;3(5): 555624. DOI: 10.19080/BBOAJ.2017.03.555624

Abstract

Keywords: Neural Networking; Clinical Trials; RBM; Risk Control; Quality Assessment

Abbreviations: RBM: Risk based monitoring; FDA: Food and Drug Administration; ANN: Artificial Neural Networks; SDV: Source Data Verification; CSM: Central Site Monitoring; HNC: Head and Neck Cancer

Introduction

The drug safety and quality are main concerned in any Clinical trial. Both the component helps to gain the faith to the regulatory agency. Recently the drug development cost increasing rapidly due to maintaining the longer follow-up, several routine procedures, work pressure to the trial staff and strict inclusion criteria to the patient recruitment. It is the challenge to improve the trial performance through cost reduction. The "Draft guidance for industry: Oversight of clinical investigations - a risk-based approach to monitoring" was presented by food and drug administration (FDA) in 2011. This comprehensive approach gives the scope to analyze and continuously improve trial data safety and quality with reduction in monitoring cost [1,2]. The challenges lies to implement and establish the risk- based monitoring (RBM) in required way [3].

There are limited literature on statistical methods to deal with Risk based monitoring (RBM) in clinical trial. Recently, it is becoming one of the emerging areas where a party has limited resources for completion of new approaches to trial oversight. The pharmaceutical industry now realized the extensive requirement to monitor an assessment of site risk. Many companies now involved through implementing the monitoring process with central manual with data review process. The objective of data review process is to identify any specific pattern that leads to the strategic concentration to involve the additional re-sources and e ort required at the best place. The data review process is based on checking the information from hard copy that is really time consuming as well there are higher chance of error.

Neural Networks are a machine learning framework process. It attempts to mimic through the learning pattern of biological neural networking. The process of natural Biological neural networks are interconnected through the neurons with dendrites that receive inputs. It is based on the input procedure. The input provides the output signal through an axon to another neuron. The mimic process happens in data driven method and known as Artificial Neural Networks (ANN). Thereafter we will refer it as neural network [4-9].

The monitoring is an ongoing procedure in any clinical trials and it is the responsibility for investigator. The objective of the monitoring is to control the quality of clinical trial and ensure the safety of the participants. There are several approaches to manage the quality of clinical trial. The traditional approach is based on site monitoring through 100% source data verification (SDV). The SDV assist for safety to the participants and also produce the quality data. However, it is time consuming and costly procedure. The quality control can be managed through the central monitoring through continual process to train the study population and routine site audits [10]. In last two decades the difficulties and cost to conduct clinical trial increased rapidly particularly for the development of new drug. The RBM can be an alternative choice to facilitate the monitoring procedure through reduced cost. The regulatory authorities like Food and Drug Administration [11,12] has also recommended to conduct the monitoring procedure through RBM paradigm.

Central statistical monitorin

The basic idea about CSM can be cited with Venet et al. [12,13]. The CSM can be applied in metacentric clinical trial. The procedure is to compare the distribution of variables of all including visit level, site level and patients' level with reference to all units. If any specific pattern of units observed then further monitoring is required to take place. The statistical methodology through application of data exploration by univariate and multivariate statistical modeling can also be performed. There are illustration about application of Mahalanobis distance to detect the outliers in the data sets [13]. The graphical representation also adopted to check the difference in variation or the shape of distribution for CSM [14]. In this study, the statistical methods discussed above can explore whether the specific clinical sites should receive more attention for data review or not.

Challenges for RBM

Based of our knowledge there is not any dedicated software packages avail-able for RBM. However, there is a statistical package available for Central Site Monitoring (CSM) is JMP Clinical. (SAS Institute Inc., Cary, NC, USA) and some codes of SAS can also be performed in R. Recently, there is tendency to conduct CSM and RBM jointly. It is our view that CSM can supervised through the adoption of neural networking as jointly with RBM or independently. The data extract procedure can also be taken like earlier studies [15-17]. It is natural that the clinical trial data measured repeatedly for patients through follow-up measurement. The follow-up measurement data are prone to have measurement errors, missing data and variability over time. These can make negative impact on data quality. The exploratory data analysis during monitoring is required to perform to measure the frequency of measurement errors, missing data and variability over time. Based on presences of these factors the sites can be separated and allocated differently for prediction about risk in near future.

Data methods

The simulation techniques adopted within this study are given below. All simulations were carried through the freely available R statistical software, thus allowing all researchers access to any suitable methods identified.

Generating the datasets

There is no consensus on the most appropriate approach with data driven method in RBM. The simulation study was preferred to assess the performance of the data driven method. The hypothetical datasets were generated to resemble the skewed distributions seen in a motivating on site clinical trial monitor-ing. The clinical trial with Phase II, Blinded Endpoint (PROBE), Multicenter, Randomized, Prospective pro les are selected to continue the study. Participant were randomized into (1:1) to either receive chemotherapy treatment A or treatment B to treat the head and neck cancer (HNC) patients. The primary objective of the study is to prolonged the duration of median survival. A total of 30 sites data are generated to conduct the trial. The sites names are randomly allotted with 'A-Z' and AA,BB,CC and DD respectively. The subject's participation is considered with 6 months duration. An arbitrarily risk scoring process was generated to address the application of statistical method. The risk score for different sites are considered. Proper validated risk scores can be generated based on study objectives and requirement. In this study the site specific risk

score is generated through the consideration of four parameters like (I) Deviation - Onsite Death during chemotherapy(A). (II) Protocol Deviation (B) (III) Unreported AEs based on 90 day follow-up (C) (IV) Death during chemotherapy (D). Each risk parameter in the Risk scoring process was generated based on the degree of impact it could have on the outcome of study conduct. Details of the risk scoring process are given in Table 1. It is assumed that the Risk Score for 30 site follows the normal distribution with mean value 50. The data generated with range (31,70). The generated data for 30 sites are given in Table 2 and in Figure 1.

In this work the perception is used to handle the one or more inputs, bias, activation function and a single output. It receives inputs, multiplies them by some weight, and then passes them into an activation function to produce an output. The linear model is used as activation function in this work. The risk score is modeled through risk indicator. The perceptron is adopted to include different layers, creating a multi-layer perceptron model of a neural network. The data convergence is not possible to conduct in the neural networking. It is required to input the normalized data as an input in neural networking. In this work the normalized data is incorporated to get the prediction of risk function. The data is splitted into equal part with training and test set. Initially, we run our data into training set and thereafter in the test set to test the performance of the test set. Initial 15 sites observations are considered for training set and rest as test set. The neural network function in free available software R is used to perform the neural networking (Figure 2).

The plot(nn) command is used to visualize the neural net data. The weighted vectors between neurons are presented through weighted vectors. The bias included is represented through blue lines. However, it is difficult to interpret the weight. The blue line represents the bias.

Illustration

Risk scoring is the process of analyzing a site about clinical trial performance in order to assess the allocation of more resources of the site. It can take numerous approaches on analyzing the risk of a site. It basically comes down to select the correct independent variables (e.g. Deviation, Protocol Deviation, Unreported AEs and Death during study. A RBM can be represented by machine learning, logistic regression, linear regression or a combination of these. The work is demonstrated through the application of open source soft-ware R. The package neural net is used to perform this analysis. The dataset contains information on different sites who involved in a chemotherapy trial. The variables "a" (Deviation),"b"(Protocol Deviation), "c"(Unreported AEs)," d"(Death during study) and risk score is available. The objective is to devise a model which predicts, based on the input variables "a","b","c" and "d", whether or not a trisk will occur in a specified time point. The dataset will be split up in a subset used for training the neural network and another set used for testing. As the ordering of the dataset is completely random, we do not have to extract random rows and can just take the first x rows.

The neural network is performed with 4 hidden nodes. There are several thumb rules to select the number of hidden. The threshold value of the output is decided with 10%. The output can be visualize in the gure1. The initial 15 sites data (split from the full data set) are adopted as train data and thereafter the test data with ret 15 sites are used to predict their performance. This is an illustration of a total of 30 sites risk scores that are simulated through the parameters A',B',C' and D' are given in Table 2. The simulated data for A',B', C' and D' are generated through random number generation and generated data are given in Figure 3. The Risk score is obtained as =A'+B'+C'+D'. An arbitrary choice of Risk Score more than 200 is considered as Risk of site is present otherwise it is absent. The parameter Risk is valued based on the status of risk score more than 200 as 1 otherwise 0.This is an arbitrary choice to train the neural networking data to provide the prediction from neural networking. The input data for the neural networking is considered through the parameters (I)Site(II) A'(III)B'(IV)C'(V) D' (VI) Risk (VII)Risk Score. The output obtained through application of neural networking is the parameter "Prediction". The prediction parameter for the sites P to DD' is obtained through neural networking and given in Table 2. The prediction value of "0" shows that the sites is having no risk and it is not required further close monitoring. The prediction value of "1" (e.g. site AA') is having risk that it is required for further close monitoring.

Appendix-I

Discussion and Conclusion

In RBM [18-21], the sharing information about risk and risk management between decision makers are required to be established [22,23]. The information about severity, probability, treatment, control, acceptability and detectability are required to be communicate. The communication between decision makers requires to build on risk control to reduce and/or accept the risks. During the protocol buildup, the past therapeutic experience and literature review about the acceptable risk is required to be developed. It can adopt as a threshold to raise a trigger. Different domain can be considered to provide the priority like cost-benefit association and finalize the optimum level of risk to be controlled. The acceptable levels of risk need to be defined to control the risk. The layout about steps to reduce or eliminate the risk needs to be incorporated. The appropriate balance between risk and resources also important. The processes for easing of quality risk when it goes over an acceptable level in known as risk reduction. The intensity and mode of monitoring need to be de ne and proper action can be taken based on relevant protocol consideration and literature review. A decision to accept the risk is known as Risk acceptance. The residual risk acceptance can be a formal decision.

It is difficult to provide accurate prediction for sites those having huge number of missing data in clinical trial. The ranking of this sites on risk based can be conducted through isolation of the presences of missing data in the sites. The handling with missing data to set the sites with risk based is challenging. It is required to the make extension of statistical methodology to rank the sites with having risk in presence of missing data. These following points can be concluded from this analysis. The acceptance of RBM in routine work is not cumbersome. It can be simply adopted through data-review process. The application of neural networking can considerably help in continually assessing risk of sites based on monitor's feedback. The application of data driven method like neural networking can speed up centralized manual review. This illustrated procedure provides path about adoption of data-driven models to assess risk. The centralized manual data review can be complemented by statistical-driven model. It can incline the distinguishing a site with risk pro le. The approaches with RBM can enabled through the use of clinical technology solution to build planning, data visualization and data -driven methods.