Assessment of Physicochemical Water Quality using Principal Component Analysis: A Case Study Wadi Hanifa, Riyadh
Mohab Amin Kamal* and Abdulaziz Ibrahim Almohana
Department of Civil Engineering, College of Engineering, King Saud University, P.O. Box 800, Riyadh 11421, Saudi Arabia
Submission: February 22, 2022; Published: March 08, 2022
*Corresponding Author: Mohab Amin Kamal, Department of Civil Engineering, College of Engineering, King Saud University, P.O. Box 800, Riyadh 11421, Saudi Arabia
How to cite this article:Mohab A K, Abdulaziz I A. Assessment of Physicochemical Water Quality using Principal Component Analysis: A Case Study Wadi Hanifa, Riyadh.Civil Eng Res J. 2022; 12(5): 555850. DOI 10.19080/CERJ.2022.12.555850
Abstract
In the current study variable statistical approaches are used; understanding of huge and complicated data matrix obtained throughout an observation of the Wadi Hanifa, Riyadh, Kingdom of Saudi Arabia. Twenty-three chemistry factors are investigated in water samples collected all month for one year from eight investigation sites wherever the Wadi influenced anthropogenic influences. The physicochemical dataset was handled using Principal Component Analysis (PCA) to extract the foremost vital parameters in assessing variation in water quality. 2 Principal issue were known as liable for the info structure explaining 87.6 percent of the whole variance of the dataset, in which season of the year (temperature) with share of variance equals to 71.3 percent and with a lesser impact the discharge of waste material treatment plants into the wadi (Conductivity) with percentage of variance of 16.3 % that represents total variance of water quality in wadi Hanifa. this study suggests that PCA techniques are helpful tools for identification of vital surface water quality parameters.
Keywords: Anthropogenic influences; Water quality; Wadi Hanifa; Wastewater treatment
Introduction
Valleys (wadis) play an important role in receiving or transporting municipal and industrial runoff and runoff from agricultural land. Municipal and industrial wastewater emission is the main source of pollution, while runoff is a cyclic phenomenon that is heavily influenced by the weather conditions in the basin. Seasonal variations in precipitation, surface runoff, intermediate runoff, groundwater runoff, and pumped inflow and outflow have a strong impact on the wadi’s runoff and hence on the pollutant concentration in the water of the wadi [1]. Long-term study and water quality monitoring programs are an appropriate approach to better understanding of river hydrochemistry and pollution, but they produce huge data sets that are often tough to interpret [2]. The problem of data depletion and interpretation of multicomponent chemical and physical measurements can be addressed through the application of multivariate statistical analysis [3]. The number of articles cited shows the importance of multivariate statistical tools in the treatment of analytical and environmental data [4 & 5]. Principal Component Analysis can be used for dimensionality reduction in a data set by retaining those characteristics of the data set that contribute most to its variance, by keeping lower-order principal components and ignoring higher-order ones. It is very useful in the analysis of data corresponding to large number of variables. It has been widely used as they are unbiased methods which can indicate associations between samples and variables [3]. It is used to reduce the dimensionality of the data set by explaining the correlation among a large set of variables in terms of a small number of underlying factors or principal components without losing much information [6].
In recent years many studies have been done using principal component analysis in the interpretation of water quality parameters, [7]. PCA has been successfully applied to sort out hydrogeological and hydrogeochemical processes from commonly collected ground water quality data ([8-11] constructed a statistical model which based on the PCA for coastal water quality data from the Cochin coast in Southwest India, which explain the relationship between the various physicochemical variables that have been monitored and environmental condition effect on the coastal water quality. The PCA technique has been used to estimate spatial and temporal patterns of heavy metal contamination [12], investigation of nutrients gradients within eutrophic reservoir [13]. Tauler et al. [14] identified the major herbicide composition causing the observed data variations using PCA. Many researchers are used these techniques to ground water quality [15, 16]. In the present study, multivariate statistical techniques (principal component analysis) were applied to evaluate the physicochemical variations in water-quality data matrix of the Wadi Hanifa, Riyadh, Saudi Arabia, which were generated under 1 year monitoring program. The research aims at identifying the most informative sampling parameters and studying the influence of natural and artificial sources affecting the variation of water quality parameters in Wadi Hanifa.
Methodology
Study Area: Wadi Hanifa, Riyadh
Located in the middle of Najd Plateau, Wadi Hanifah is the most significant natural landmark of the region that form with its basin and tributaries a unique 120-kilometer-long ecological region stretching from Tuwaiq Escarpment to the open desert southeast of Riyadh. The depth of valley stream ranges between 10 and 100 meters, and its width ranges from 100 to 1000 meters approximately (Figure 1). Wadi Hanifah represents a natural watershed for the floods and rainwater in an area of 4000 m2and it has more than 40 tributaries [17]. The most important among thevalley’s tributaries are AlObaitah, AlImariyah, Safar, AlMahdiyah, Beir, Laban, Namar, AlAwsat and Laha in the west, and AlAysan and AlBathaa in the east. The amount of water poured into Wadi Hanifah is about 700,000 m3. The wadi is dry almost all year round, but it is still fruitful thanks to the aquifers near the surface. Riyadh consumes about 1080 million cubic meters of water consistent with year [18]. Because of the persistent draw-down of the water desk to deal with the city’s ever-growing population, Riyadh has needed to locate opportunity reassets of water. Now maximum of the city’s (desalinated) water deliver is piped in from the coast 350 km away, a totally high-priced and unsustainable option [19]. Riyadh consumes about 1080 million cubic meters of water consistent with year [18]. Because of the persistent draw-down of the water desk to deal with the city’s ever-growing population, Riyadh has needed to locate opportunity reassets of water. Now maximum of the city’s (desalinated) water deliver is piped in from the coast 350 km away, a totally high-priced and unsustainable option [19].
Monitored Parameters and Analytical Methods
In order to represent the water quality of the Wadi system, accounting for stream and inputs from drains that have impact on downstream water quality, the monitoring and sampling plan were designed to cover a wide range of determinants at specific sites. Under the water-quality monitoring plan of Wadi Hanfia, samples were collected each month at three points across the Wadi width at several sites. Sampling, preservation and transportation of the water samples to the laboratory were as per standard methods [20]. Eight sites were selected to cover the full length of the wet part of Wadi Hanifa as shown in (Figure 2). Their names are SW3C (site 1), SW12A (site 2), SW12C (site 3), SW14 (site 4), SW20 (site 5), SW8G (site 6), SW10B (site 7), and SW16 (site 8). The sites location are chosen as four sites before the bioremediation facility and four sites after the facility. However, site 5 is before the connection between Batha Channel, which carries water from Manfoha wastewater treatment plant and the Wadi.
Data Treatment and Multivariate Statistical Methods
Wadi Hanfia water quality data sets were subjected to one multivariate analysis techniques; namely, principal component analysis (PCA). PCA was applied on data standardized through z-scale transformation to avoid great variability of monitored chemical and physical parameters by converting the measured values into dimensionless quantities. Standardization also increases the effects of parameters with low variance and reduces the effects of parameters with high variance [21-22]. PCA was carried using SPSS Software.
Principal Component Analysis
The details of PCA are discussed in readily accessible published literature. For example, Davis [23], Manly [24], and Shaw [25] present very understandable and practical introductions to the subject with minimal mathematical details, while Jackson [6], Jolliffe [7], Johnson and Wichern [26], and Legendre [27] present the mathematical details. Briefly, PCA transforms a dataset containing p variables (analytical constituents), interrelated or correlated to various degrees, to a new dataset containing p new orthogonal, uncorrelated variables called principal components (PCs). The PCs are linear functions of the original variables such that the sum of their variances is equal to that of the original variables. The PCs are ordered from largest variance (PC1) to next largest variance (PC2), etc. The variances of the PCs are the eigen values and the coefficients or weights are the eigenvectors extracted from the covariance or correlation matrix. The goal is that the first few or k PCs (k << p) will retain most of the information in all of the p original variables, thus reducing the practical dimensionality of the dataset. In other words, if the correlations are high among many of the original variables, the first few PCs will tend to contain (or explain) a large percentage of the total variance and may be used to describe multivariate patterns or variation in water quality across the watershed, almost as well as does the complete set of p original variables. Often these patterns are related to specific sources of contamination [1, 28].
Several techniques are used by researchers to select the meaningful PCs (i.e., those PCs that account for most of the variation within the studied data) [29, 7]. Probably the most used technique is the “percent explained variance” where enough PCs are used to cover a large fraction of the data total variance. Another technique is to use PCs with eigenvalues greater than a particular value (usually 1.0.) A third technique is to use the slope in the scree plot as a selecting criterion. So, all PCs are used till the slope of the scree plot exhibits a significant change. Another method that is rarely used [29] is comparing the scree plot against a “broken stick” model. This technique –adopted in this paper – was introduced for the first time by Frontier. It is claimed that this technique is more accurate in selecting the appropriate number of PCs than other mentioned techniques [29, 5]. In this technique,it is assumed that the expected distribution of eigenvalues will resemble a broken stick model if the total variance is divided randomly amongst the various components. The size of random eigenvalue for the kth component following the broken stick model is calculated using Equation 1 [7] (where p is the number of variables):
Principle components with eigenvalues greater than or equal to the corresponding broken stick random eigenvalue are retained and considered as interpretable. In this case study, two groups of data both for physical and chemical evaluations have been selected and the number of analytical parameters used to assign a measure from a monitoring site into a group (monitoring area) has been taken as n. Water-quality monitoring of the Wadi Hanfia was regularly conducted over a period of one year at eight different sites. All the samples were analyzed for various parameters (23 nos.) and their site wise mean values and standard deviations are summarized in (Table 1).
Results and Discussion
Principal Component Analysis
PCA of the 23 physicochemical parameters resulted in 23 components with eigenvalues variances as shown in (Table 2). An eigenvector was calculated for each eigenvalue (Table 3). (Figure 3) shows the scree plot of the components with the broke stick limiting curve. As explained earlier, only components with eigenvalues higher than the broken stick limit will be considered as significant. These PCs are the only nontrivial principal component in this case. PCA of the data set (Table 2) evolved two PCs with eigenvalues equal 16.402, 3.784 explaining about 71.314 %, 16.294, respectively, of the total variance in the water-quality data set. The first two PC component are the only one considered because they have eigenvalue higher than the corresponding value from the broken stick model as shown in (Figure 3). In general, components loadings larger than 0.45 may be taken into consideration in the interpretation, in the other words, the most significant variables in the components represented by high loadings have been taken into consideration in evaluation the components [30]. Terms ‘strong’, ‘moderate’ and ‘weak’ is applied to factor loading and refer to absolute loading values of >0.75, 0.50 – 0.75 and 0.3 – 0.5, respectively [21]. Factor 1, which explains 71.314 % of the total variance, has strong absolute loadings on Turbidity, SS, COD, Nitrite-N, Ammonia, TKN, TP, Ortho-P, TOC, VSS, BOD and negatively correlated with Cond, DO, pH, TDS, Salinity, Nitrate-N, which indicates that the temperature (season of the year) has great effect on the water quality of Wadi Hanifa. Factor 2, which explain 16.294 % of the total variance, has strong absolute loadings on TDS, Salinity, Nitrate-N and TN and negatively correlated with Alkalinity and bicarbonate [31], which indicates that the conductivity is dominant factor since most of the water in the wadi is from the discharge of wastewater treatment plants in Riyadh,
Conclusion
The PCA is powerful pattern recognition technique that attempts to explain the variance of a large dataset of inter correlated variables with a smaller set of independent variables Principal Component. The above study identified the principal physical, and chemical parameters that are important in predicting surface water quality. The first factor is temperature (season of the year) which has great effect on the water quality of Wadi Hanifa. Factor 2 (conductivity) is dominant factor since most of the water in the wadi is from the discharge of wastewater treatment plants in Riyadh.
References
- Vega M, Pardo R, Barrado E, Deban L (1998) Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis. Water Res 32 (12): 3581-3592.
- Dixon W, Chiswell B (1996) Review of aquatic monitoring program design. Water Resources p: 30. 1935-1948.
- Wenning RJ, Erickson GA (1994) Interpretation and analysis of complex environmental data using chemometric methods. TrAC Trends Anal Chem 13(10): 446-457
- Brown D, Blank TB, Sum ST, Weye LG (1994) Chemometrics. Anal. Chem. 66: 315-359.
- Brown SD, Sum ST, Despagne F (1996) Chemometrics. Anal. Chem. 68: 21-61.
- Jackson JE (2003) A User’s Guide to Principal Components. John Wiley & Sons, New York, USA pp: 569.
- Jolliffe IT (2002) Principal Component Analysis, Second ed. Springer-Verlag, New York, USA pp: 487.
- Jayakumar R, Siraz L (1997) Factor Analysis in Hydro geochemistry of Coastal Aquifers Preliminary Study. Environmental Geology 31: 174-177.
- Salman SR, Abu Ruka’h YH (1998) Multivariate and Principal Component Statistical Analysis of Contamination in Urban and Agricultural Soil from North Jordan, Environmental Soils from North Jordan. Environmental Geology 38(3): 265-270.
- Praus P (2005) Water Quality Assessment Using SVD Based Principal Component Analysis of Hydrological Data. Water SA 31(4): 417-422.
- Iyer CS, Sindhu M, Kulkarni SG, Tambe SS, Kulkarni BD (2003) Statistical Analysis of The Physicochemical Data on the Coastal Water of Cochin. J. Environ. Monit 5: 324-327.
- Shine P, Ika RV, Ford TE (1995) Multivariate Statistical Examination of Spatial and Temporal Patterns of Heavy Metal Contamination in New Bedford Harbor Marine Sediments. Environ. Sci. Technol. 29(7): 1781-1788.
- Perkins RG, Underwood GJC (2000) Gradients of Chlorophyll and Water Chemistry along an Eutrophic Reservoir with Determination of the Limiting Nutrient by in situ Nutrient Addition. Water Res 34: 713-724.
- Tauler R, Barcelo D, Thurman EM (2000) Multivariate Correlation Between Concentrations of Selected Herbicides and Derivatives in Outflows from Selected US Midwestern Reservoirs. Environ. Sci. Technol. 34: 3307-3314.
- Gangopadhyay S, Gupta AD, Nachabe MH (2001) Evaluation of Ground Water Monitoring Network by Principal Component Analysis. Ground Water pp: 181-191.
- Winter TC, Mallory SE, Allen TR, Rosenberry DO (2000) The use of principal component analysis for interpreting ground water hydrographs. Ground Water 38: 234-246.
- Royal Commission of Riyadh City (RCRC) (2021) Environmental Rehabilitation Program for Wadi Hanifa and its Tributaries.
- (2019) Ministry of Environment, Water and Agricultural (MEWA). Kingdom of Saudi Arabia Water Strategy.
- Al-Samhouri W, Al-Naim M (2007) On Site Review Report, Wadi Hanifa, Riyadh, Saudi Arabia, National Built Heritage Forum.
- APHA (1992) Standard Methods for the Examination of Water and Wastewater, 18th APHA, Washington, DC, USA.
- Liu CW, Lin K-H, Kuo YM (2003) Application of factor analysis in the assessment of groundwater quality in a backfoot disease area in Taiwan. Sci. Total Environ 313(1-3): 77-89.
- Simeonov V, Stratis JA, Samara C, Zachariadis G, Voutsa D, et al. (2003) Assessment of the surface water quality in Northern Greece. Water Res 37(17): 4119-4124.
- Davis JC (2002). Statistics and Data Analysis in Geology, Third ed. John Wiley & Sons, New York, USA pp: 638.
- Manly BFJ (2000) Multivariate Statistical Methods: a Primer, Second ed. Chapman and Hall/CRC. P: 25.
- Shaw PJA (2003) Multivariate Statistics for the Environmental Sciences. Arnold, London pp: 233.
- Johnson RA, Wichern DW (1998) Applied Multivariate Statistical Analysis. Fourth ed. Prentice Hall, New Jersey pp: 816.
- Legendre P, Legendre L (1998) Numerical Ecology. Second English edition. Elsevier, Amsterdam pp: 853.
- Helena B, Pardo R, Vega M, Barrado E, Fernandez JM, et al. (2000). Temporal evolution of ground water composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Res 34(3): 807-816.
- Olsen Roger L, Rick Chappell W, Jim Loftis C (2012) Water quality sample collection, data treatment and results presentation for principal components analysis e literature review and Illinois River watershed case study. Water Res 46(9): 3110 - 3122.
- Mazlum N, Ozer A, and Mazlum S (1996) Interpretation of Water Quality by Principal Component Analysis. J. of Engineering and Environmental Science 23: 19-26.
- Alhamid, Abdulaziz A, Saleh A, Alfayzi, Mohamed Alfatih Hamadto (2007) A Sustainable Water Resources Management Plan for WadiHanifa in Saudi Arabia. King Saud Univ. Eng. Sci. 19(2): 209-222.