Modelling of Dissolved Oxygen in Thi Vai River Water Incorporating Artificial Neural Network and Multivariable Regression
Tat Pham Van1*, Minh Phap Dao2and Pham Nu Ngoc Han1
1*Department of Science and Engineering, Hoa Sen University, Vietnam
2Center of Environmental Engineering and Monitoring, Dong Nai Province, Vietnam
Submission: November 13, 2017; Published: November 30, 2017
*Corresponding author: Tat Pham Van, Department of Science and Engineering, Hoa Sen University, Vietnam, Email: firstname.lastname@example.org
How to cite this article: Tat P V, Minh P D, Pham N N H. Modelling of Dissolved Oxygen in Thi Vai River Water Incorporating Artificial Neural Network and
Multivariable 04 Regression. Int J Environ Sci Nat Res. 2017;7(1): 555703. DOI: 10.19080/IJESNR.2017.06.555703.
The water quality of watershed is one of the major concern in the operation and water quality management of watershed. The dissolved oxygen (DO) is one important element of important indicators for water bodies. This is essential demand for micro-organisms and a significant parameter of the aquatic ecosystems. In this work, we predicted the DO concentration of Thi Vai river, Viet Nam based on the relationships between the dissolved oxygen and the hydrologic parameters such as temperature, pH, turbidity, conductivity, chemical oxygen demand (COD), biological oxygen demand (BOD), nitrate and phosphate. The multivariate regression (MLR) technique and back-propagation neural network (BPNN) were used to establish those relationships. The study results showed that the neural network BPNN I(8)-HL(7)-O(1) with R2train = 0.96, RMSE = 0.14, GAME = 0.12) is able to predict the DO concentration accurately. The neural network BPNN I(8)-HL(7)-O(1) is a useful tool for management of environmetal quality of Thi Vai river water in Viet Nam. The DO concentration of this river water was represented by interpolated maps using Inverse Distance Weighted (IDW) function. We can use those maps to perform the managerment solutions to decrease the polluted level.
Water scarcity in the world has been occurring more and more seriously each year . Water resources in some areas are declining in both quantity and quality. It is directly linked with human welfare such as recreational activities (swimming and boating), or municipal, industrial, and private water supplies, agricultural uses including irrigation and livestock watering , the quality of water is considered to be a vital concern for mankind . In addition, the assessment and management of water resources has become very complex with population growth . Today, decisions on water resources management are increasingly being based on model studies . Therefore, the precise determination of concentration of pollutants in water is an essential requirement to support effective management and legislation . Numerous computational and statistical approaches have been applied to predict the water quality in reservoirs Liu et al. 2009, Cho et al. 2009; White et al. 2010; Lindim et al. 2011.
The dissolved oxygen (DO) is an important quality index for evaluating surface water quality because it represents for polluted level, the state of aquatic ecosystems of water bodies Chen and Liu, 2014. However, because of the influence of different factors on different waters, it is difficult to simulate DO concentrations by traditional mathematical methods . In addition, limited water quality data and the high cost of water quality monitoring often pose serious problems for process-based modelling approaches. Moreover, the elements of aquatic eco-systems such as chemical,physical, and biological components are very complex and nonlinear. Which model and parameter will be used to model DO remains a question. In this regard, some traditional models, such as regression models and Artificial Neural Network (ANN) models were investigated and compared He et al. 2011 applied the model MLR and ANN to forecast the daily DO variation based on water temperature (WT) and runoff in the Bow River, Canada. The results show that the ANN model outperformed MLR model  suggested DO = -0.18WT +0.591pH- 35.46 with the coefficient of determination (R2) was 0.4 for DO value in the River Danub  asserted that ANN are a relatively new concept in environmental modeling. ANNs are suitable to model nonlinear processes, such as the dynamics of DO in surface water . Many kinds of networks can applied in ANNs such as Multi-Layer Perceptron (MLP) . Adaptive Neuro Fuzzy Inference System ( MLP and ANFIS) . Recurrent Neural Networks (RNN), Generalized Regression Neural Networks (GRNN), Radial Basis Function Network (MLP and RBFN) . However, according to [13-15]. ANN using Backpropagation Neural Network (BPNN) is the most widely used neural network for forecasting/prediction purposes. BPNN generally consists of three layers including an input layer, a hidden layer, and an output layer . Each layer consists of neurons which are connected to the neurons in the previous and flowing layers by connection weights. These weights are adjusted according to the capability of the trained network. Input vectors and corresponding target vectors are used to train BPNNs until the models can approximate a specified minimum error or a maximum number of epochs. BPNNs with weightings, biases, a sigmoid layer, and a linear output layer can approximate any function with a finite number of discontinuities.
In recent years, several researches have been conducted on water quality simulation including DO using ANNs models [17-21]. This method can predict the water quantity data with high precision and more robust  established an ANN model to predict total Nitrogen, total Phosphorus, total Organic Carbon, DO and Fe in deep waters of Swedish Lakes. The efficiency of regression model and ANN application in DO determination was reported in the aforementioned studies conducted for specific areas. However, to date study on predicting DO using regression and ANN models in Vietnam is limited. Thus, in this study, the application of ANN using Backpropagation Neural Network (BPNN) algorithm and multiple linear regression analysis (MLR) to model DO based on temperature, pH, turbidity, and conductivity, COD, BOD, NO3- and PO43-. Besides, we also compared the coefficients of two models (coefficient of determination - R2 and root mean square error - RMSE) to determine which model is better for predicting DO in water bodies.
Thi Vai river is a tributary of Dong Nai river system in Viet Nam. Thi Vai river basin covers an area of 625 square kilometers, starting from Nhon Tho village, Long Thanh district, Dong Nai province, which flows through Tan Thanh district,Ba Ria-Vung Tau province and Can Gio district in Ho Chi Minh city. The river basin has a length of 32km and a width of 400 to 600m. This river basin has a depth of 12-20m (the deepest area is about 60m) TTH Nguyen et al. 2017 Thi Vai river is a tidal estuary. The tidal range of this river is higher than 400cm with fast flow. The salinity of Thi Vai river varies from 24% to 32% in the rainy season, so Thi Vai river has the saline characteristics of salt.The water quality of Thi Vai river is represented by DO parameter. This parameter pointed out the decreasing tendency. This can be also explained that Thi Vai river receives not only 34,000m3 of the untreated sewage from about 200 factories along the river basin, but also it receives the large amounts of untreated sewage from residential and cattle-farming areas. A few locations with high pollution include the areas near VEDAN company and Go Dau port or Phu My port Lorenz et al. 2014.
A monitoring program was also established by Centre of environmental technology and Dong Nai department of natural resources and environment. Water samples used in this work were collected by 7 monitoring stations in the period from 2010 to 2016 covering all different areas of Thi Vai river, as shown in (Figure 1) Nine physico-chemical parameters pH, temperature, disolved oxigen (DO), chemical oxygen demand (COD), biological oxygen demand (BOD5), conductivity (EC), turbidity, nitrate and phosphate were used as input data in the multivariate regression MLR and artificial neural network ANN model. The dataset was seperated into training set (occupied 80% of total data) and test set (20%).
MLR model is used to represent the relationship between dissolved oxygen DO and physico-chemistry parameters as a linear function of several predictors . MLR model was applied as well in this work to prove their impact on dissolved oxygen DO as
Where Y dependent variable (dissolved oxygen DO); xk is independent variable (kth physico-chemistry parameter); β0 regression constant; and βk coefficient of kth physico- chemistry parameter, respectively (Table 1) The abbreviation for 8 physico-chemistry parameters for predicting disolved oxygen DO. The parameter DO is used to indicate of pollution and self-cleaning capability of water bodies. Dissolved oxygen is also necessary to the organism fish, invertebrates, bacteria and plants. These use the dissolved oxygen in respiration. The biological oxygen demand (BODs) and chemical oxygen demand (CODs) consumed some of the dissolved oxygen in water. These processes can cause of decrease in dissolved oxygen DO. Oxygen is involved in the metabolism processes of various nitrogen- containing compounds. The nitrate- and phosphorus products generate the free forms of nitrate and phosphate in the water. The various forms of nitrogen and phosphorus content facilitate the development of algae. This can change the concentration of dissolved oxygen in water. The appearance of dissolved solids in the ionic form of nitrates, phosphates, etc. also changes the conductivity (EC) so the EC also indicates the DO change. In addition, some other factors also affect the DO. The biological and chemical processes can be changed as the pH changes. This affects the oxygen-consumption capability of oxygen-demanding processes. In addition, dissolved oxygen DO is influenced by natural factors such as temperature, pH, salt and several algae. The solubility of oxygen in water decreases as temperature increases. Furthermore the dissolved oxygen also decreases by the exponential function as salt levels increase. At the same time, turbidity affects the dissolved oxygen DO because of it increases the light absorption, this can increase the water temperature. Therefore the dissolved oxygen presents the important role in pollution assessment of Thi Vai river basin.The most important variables effecting on DO in Thi Vai river basin were determined based on coeffient of contribution taking the form:
Where MPxk,% is average of contribution percentage of physico-chemistry parameter kth on DO, βk regression coeffients of parameter in MLR model, xk independent variable k (Table 1).
The back-propagation neural network BPNN is known a multi-layer feed forward network. This BPNN is trained by the training dataset while it tunes the network parameters using an error back propagation mechanis . A BPNN is composed of several layers of networks, however, it is most commonly accepted as three-layer architecture BPNN I(k)-HL(m)-O(n); the input layer I(k) consists of k = 8 input neurons as physico- chemistry parameters in Table 1; the hidden layer HL(m) has m = 7 neurons; and the output layer has one neuron (with n=1) such as dissolved oxygen DO . BPNN is trained with Levenberg - Marquardt algorithm. The transfer function was used in each neuron on hidden layer is tansig; for output layer the transfer function is purelin. The learning function used is learngdm. The value MSE of 2.5573 x 10-5 obtained from training process after training 10000 epochs. The neural network BPNN architecture I(8)-HL(7)-O(1) as presented in (Figure 2) was constructed successfully for predicting the DO values from 8 hydrological parameters using various sampling locations in years 2010 to 2016. The most important variables effecting on DO in BPNN architecture I(8)-HL(7)-O(1) evaluated are based on the weight coefficients taking the form,
Where RIx the relative importance of input neuron x , Σwxy wyz sum of final weights of the connection from k input neurons to m hidden neurons and the connection from m hidden neurons to n output neuron; y sum of m neurons in hidden layers, output neuron (dissolved oxygen DO).
In this study, several statistical error measures were used to assess the performance of the applied models. The root mean square error (RMSE), coefficient of determination (R2) and mean absolute error (MAE) were used to provide an indication of goodness of fit between the observed and predicted values. Expressions of these error parameters are given as follows:
Where n number of observation; Yi ith observed values of dissolved oxygen DO; Y average observation value of DO; and Ŷi predicted values DO for observation ith. In addition, the t-test method of paired two sets was also applied to compare the predicted value (DOpred) from MLR and BPNN-I(8)-HL(7)-O(1) model (Table 2) with observed value DOobs, respectively.
In this work the predicted values DOpred resulting from MLR and BPNN architecture I (8)-HL(7)-O(1) model and observed values DOobs were used to zone the water quality of Thi Vai river using GIS technique. For GIS technique, the Inverse Distance Weighted (IDW) function is applied to interpolate the dissolved oxygen values of the river zones. The IDW function can be used to interpolate for DO values of any unmeasured locations using the measured values surrounding a prediction location taking form
Where zi observed value ith; d distance from unmeasured location to observation location ith
In general, the water quality of Thi Vai river was surveyed in years 2010 to 2016, but all did not meet the living water standards. The average results of dissolved oxygen DO and the statistical evaluation at seven locations on Thi Vai river basin are presented in (Table 2). The mean, maximum and minimum values of DO concentration as well as of other parameters have pointed out the variation range of accuracy and reliability for the observed data collecting from the survey locations. Moreover, the statistical values as the variation coefficients depicted the variation limit of water quality in different locations of Thi Vai river basin. This depended on the climate change and hydrometeorology in the river basin. The survey locations showed the change of concentration DO at the different locations corresponding to the production and living processes, as given in (Table 3).
The SW-TV-01 and SW-TV-02 locations are in far from domestic and industry areas. So these areas exhibited the highest concentration of dissolved oxygen DO in years 2015 and 2016. For five remaining locations the concentration of dissolved oxygen is lower, due to the areas are impacted from the waste sources of domestic, farming and factory areas. The lowest concentrations of DO are in the SW-TV-03 and SW-TV-04 areas, which are affected by the living waste water and fish farming as well as 200 factories along Thi Vai river basin, as given in (Table 3). In general the highest concentration of DO was found at the confluence location of Ba Ky canal and Thi Vai river. The lowest concentrations of DO are in VEDAN and Go Dau area. Because of these areas are influenced by the waste water from the factories and ships. But in these areas the water quality of this river is still in an average range and it can be improved by the closer monitoring from the units of environmental management in recent years.
For data of water quality of Thi Vai river measured were collected from years 2010 to 2016, as given in Table 2, we have developed a multivariate linear relationship between DO and physico-chemistry parameters using multivariable regression techniques. The quality of multivariable models was evaluated by calculating the values of regression statistics such as R2train, Standard Error (SE), and value of multiple regression correlation between the observed DOs and predicted DOpred . values.
The best linear relationship between DO and 8 hydrological parameters can be written in the following form:
The regression results pointed out the statistical values R2train of 0.81 and RMSE of 0.32; it is satisfactory for regression- statistics standard. The contribution percentage MPxk,% of each parameter xk in regression equation (8) was calculated by using the coeffiences βk. The important effects on the dissolved oxygen DO of Thi Vai river basin were determinated by formula (2), as depicted in (Figures 3 & 4). We have also used the ANOVA statistics to compare the significant level of regression model based on value Fsig = 0.000 in confidence level α = 0.05.
Furthermore the parameter pH tends to dominate all other remaining considerations for dissolved oxygen DO in Thi Vai river (MPxk,% > 70%). The parameters can be sorted in order of influence for parameter DO: pH > temperature > conductivty > BOD > phosphate > COD > nitrate > turbility.
The neural network BPNN with architecture I(8)-HL(7)-O(1) in (Figure 2) was constructed by Levenberg-Marquardt converging algorithm with neurons on input layer such as pH, temperature, COD, BOD, EC, turbidity, NO3- and PO43-. This neural network I(8)-HL(7)-O(1) was proceeded by using 10000 epochs. The results of prediction in BPNN I(8)-HL(7)-O(1) are presented in (Figure 5). The correlation value R2train of 0.9624 between the observed DOobs and predicted DOpred values is extremely high, as showed in (Figure 4a). It means that the approximation 96.24% of variation in DO is explained by variation in 8 physico- chemistry parameters. Thereby the discrepancy between blue (observed DOobs) and red (predicted DOpred resulting from ANN model) line is insignificant. This is presented clearly by bestfit capability (Figure 5b). In other words, the ANN model I(8)- HL(7)-O(1) pointed out the high predictability and reliability. In neural network BPNN I(8)-HL(7)-O(1) the neurons on input layer pointed out the important effects in the training process of this neural network, as showed in (Figure 5). The value RI of 18.65% was calculated by formula (3). The parameters can be also sorted in order of influence for parameter DO: phosphate > conductivty > pH > COD > BOD > nitrate > turbility.
The models BPNN I(8)-HL(7)-O(1) and MLR (with k = 8) were tested by using the statistical values such as coefficients R2train of 0.96, R2test of 0.9211 for BPNN I(8)-HL(7)-O(1) and R2train . of 0.811, R2test of 0.4423 for MLR model, as exhibited in (Figure 6). Moreover values RMSE for BPNN and MLR model are also used to indicate the predictability. In addition the global absolute mean of errors GAMEs for models MLR and BPNN I(8)-HL(7)-O(1) were close to zero, as shown in (Table 4). These suggest that the BPNN I(8)-HL(7)-O(1) model produces the less error. The results are also used to imply that the predictability of BPNN I(8)-HL(7)-O(1) was better than MLR model. This finding was consistent with the studies proposed by [19,26,27]. To assess the efficiency of each model, the method t-test paired two samples for means was also used to evaluate the difference between the observed DOobs, with predicted DOpred .values, as given in (Table 4). The results of t-test paired two samples showed that the difference between models MLR and BPNN is insignificant at confident level at 95%.
In this study, the IDW function of GIS technique was used to interpolate the DO values surrounding the observed DOobs and predicted DOpred values from MLR and BPNN I(8)-HL(7)-O(1) model. The interpolated values resulting from IDW function were used to make the map of water quality in Thi Vai river basin. The maps of water quality of two years 2015 and 2016 were used to compare between the DO values from models, as exhibited in (Figure 7). In part of Thi Vai river, in particular, the location of VEDAN area and other areas near Phu My and Thermal Power Plant are in decline, appropriate measures should be taken to minimize pollution. In general, the water quality of Thi Vai river is at average level up to the regulated standards. A few locations are polluted by the discharge of effluents from residential, factory and farming area such as location SW-TV-05 and SW- TV-06 in 2015. But the water quality of the locations SW-TV-05 and SW-TV-06 in 2016 was improved emphatically. Due to the environmental management of Thi Vai river by the authorities in 2016 was carried out efficiently. The effluent from the factories along Thi Vai river basin causes the water area herein to be unsafe for water supply. In addition, the fish farming also affects the water quality such as locations SW-TV-03 and SW-TV-02. Furthermore, the water quality of Thi Vai river is also impacted by the tidal change [26-29].
The MLR and BPNN I(8)-HL(7)-O(1) model were then constructed successfully to predict the DO parameter in Thi Vai river. From MLR model the important effect of physico-chemistry parameters for Thi Vai river basin were also determined. This can help the environmental managers to produce the law of environmental monitoring. The predicted DOpred values resulting from BPNN I(8)-HL(7)-O(1) turn out to be a good agreement with the observed DOobs values. This BPNN can be used to be superior to the multilinear regression MLR model. The application of the neural network I(8)-HL(7)-O(1) is more appropriate for predicting the dissolved oxygen DO. The GIS techniques is also a very efficient tool to make the interpolated maps by IDW function.
Lihua C, Shengquan M, Li LI (2008) A model to evaluate do of river based on artificial neural network and style book. J Hainan Normal Univ 21: 372-376.
Csabragi A, Molnar S, Tanos P, Kovacs J (2017) Application of artificial neural networks to the forecasting of dissolved oxygen content in the Hungarian section of the river Danube. Ecol Eng 100: 63-72.
Erzin Y, Rao BH, Patel A, Gumaste SD, Singh DN (2010) Artificial neural network models for predicting electrical resistivity of soils from their thermal resistivity. Int J Therm Sci 49: 118-130.