Biplots in Covariance Analysis

This is also written as ( ) ( ) 0 0 0 0 0 cov , var , cov( ) X X X X X = = Here, the variances of 0 X are given in the diagonal of (2), while the covariances are shown off-diagonal. The relationships between different sets of variables can be explored using some form of graphical display such as biplots [1,2]. Since biplots are useful graphical tools for exploring the relationships between variables, the biplot is employed in the form of the covariance biplot. In this paper, the general idea behind the covariance biplot is discussed. It further demonstrates, with graphical illustrations, how the covariance biplot can help to reveal variables and inter-variables relationships. This paper is a progressed work of Oyedele & Gardner-Lubbe [3].


Introduction
In a situation where the relationships between different sets of variables are of interest, various statistical techniques can be useful tools for analysis. Among them is the covariance matrix. cov( ) X X X X X = = Here, the variances of 0 X are given in the diagonal of (2), while the covariances are shown off-diagonal. The relationships between different sets of variables can be explored using some form of graphical display such as biplots [1,2]. Since biplots are useful graphical tools for exploring the relationships between variables, the biplot is employed in the form of the covariance biplot. In this paper, the general idea behind the covariance biplot is discussed. It further demonstrates, with graphical illustrations, how the covariance biplot can help to reveal variables and inter-variables relationships. This paper is a progressed work of Oyedele & Gardner-Lubbe [3].
The remainder of this paper is organized as follows. Section 2 provides a brief overview of the biplot and its fundamental idea, before its employment in the form of the covariance biplot is discussed in Section 3. This is followed by an application with a mineral sorting production data that shows the quality evaluation of five hundred and seventy-two processes used to produce a final product in Section 4. Finally, some concluding remarks are offered in Section 5.
Since the biplot was first introduced by Gabriel [5], its theory has been significantly extended with Gower & Hand's [1] monograph, Yan & Kang's [6] description of various methods to visualize and interpret a biplot, Greenacre's [7] text on the use of biplots in practice and Gower, Lubbe and Le Roux's [8] illustration of the construction of various forms of biplots. In the first biplots introduced by Gabriel, the rows and columns of a data matrix were represented by vectors, but to differentiate between these two sets of vectors, Gabriel [5] suggested that the rows of the data matrix be represented by points. Gower & Hand [1] went a step further by introducing the idea of representing the columns of the data matrix by axes, rather than vectors, while still representing the rows of the data matrix by points. This was done to support their theory that biplots were the multivariate version of scatter plots. Gower & Hand's [1] biplot representation is very useful when the data matrix under consideration is a matrix of samples by variables. An example of the biplot display of the chemical-sensory data by Mevik & Wehrens [9] is shown in Figure 1. This data shows the sensory descriptors and the chemical quality measurements of sixteen olive oils. In this biplot display, a representation of the variance of each variable is provided. This is represented by the thicker arrow (vector) on each axis in each display. For example, the standard deviation of Acidity is smaller compared to the others, while DK has a large deviation. This is evident from the length of the vector on these axes. Furthermore, several relationships can be deduced from this biplot, such as a relation between Syrup, K232 and Peroxide. Another relationships deduction is the relation between K270, Transp and Glossy, and between Green, Yellow and DK. These relationship deductions are done based on the angle between axes.
Since 1971, biplots have been employed in a number of multivariate methods as a form of graphical representation of data, pattern and data inspection, as well as for displaying results found by well-known statistical methods of analysis [10][11][12]. The most well-known methods are Principal Component Analysis (PCA), Correspondence Analysis (CA), Multiple Correspondence Analysis (MCA), Canonical Variate Analysis (CVA), Multi-Dimensional Scaling (MDS), Discriminant Analysis (DA), Generalized Linear Models (GLMs), and recently, Partial Least Squares (PLS) by Oyedele & Lubbe [13]. All these forms of biplots have been applied to diverse fields of specialization, according to different needs and requirements.

Fundamental idea of biplots
A biplot is a joint graphical display of the rows and columns of a data matrix D (of G rows and H columns) by means of markers 1 2 , ,....., G a a a for its rows and markers 1 2 , ,....., H b b b for its columns. Each marker is chosen in such a way that the inner product  [14]. Biplots are often constructed in two dimensions. This does not mean that they are limited to two dimensions, but this is the most convenient biplot display. However, with D being a (very) large matrix, the rank of D is almost always higher than two. As a result, some approximation is done on D to obtain a lower rank. Methods such as PCA can be used to perform this approximation. In PCA, the approximation is based on the method of least squares. To be precise, the sum-of-squares of the differences between D and its approximation  D is minimized.
That is, minimize trace Taking  D as the rank two approximation of D, the biplot of  D a data matrix D relies on the decomposition of into the product of two matrices,  , its row markers matrix (A) and its column markers matrix (B). Matrices A and B are defined as Thus, the approximated rows and columns of a data matrix are represented in biplots. Generally, the number of columns in A and B are determined by the rank r approximation of D. In practice, r=2 is usually preferred for a convenient biplot display.

Biplot points and axes
In asymmetric biplots, the rows of a data matrix are represented by points, while the columns are represented by vectors or axes. Traditionally, columns are represented by vectors, but Gower & Hand [1] introduced axes to make the biplot similar to a scatter plot. This was done by extending the vectors, which represent the columns, through the biplot space to become axes. Thus, the biplot points will be defined by the row markers of the data matrix, whereas the biplot axes will be defined by the column markers. More precisely, for the biplot of a data matrix D, from equation (3), G rows of A will serve as the biplot points, while H rows of B will be used in calculating the directions of the biplot axes. An example of the biplot display can be seen in Figure 2.  In general, there are two kinds of features displayed in biplots. These features can be specified as two sets of variables, or as a set of variables and samples, as in the case of the PCA biplot. This does not mean that biplots cannot be constructed by using only one kind of feature, but depending on the data matrix and the choice of features to be analyzed, biplots can be constructed to display only one kind of feature. Gower, Lubbe and Le Roux [8] termed such biplots monoplots. In a monoplot, the kind of feature to be represented may be the samples only or one set of variables. Including an additional feature in the monoplot, say, another set of variables, would result in a biplot. As only variables are represented in the covariance and variance-covariance matrices, see equations (1) and (2), both monoplots and biplots can be used as graphical tools to explore their relationships. More precisely, a monoplot would be suitable for representing a variance-covariance matrix (equation (2)) graphically, while a biplot would be more appropriate for a covariance matrix (equation (1)). An example of the resulting monoplot display can be seen in Figures 3 & 4, while an example of the resulting biplot display can be seen in Figure 2.

Biplot implementation into covariance analysis framework
Moreover, given that only one set of variables, in this case , is under consideration, and as the focus is on revealing the relationships within these variables, only one set of axes is needed. From expression (4), the directions of these axes are calculated by the P rows of either X G or X H .

Covariance biplots
Consider both 0 X and 0 . Y The P×M covariance matrix between 0 X and 0 X is defined in equation (1). Let for any value of (0,1) In expression (5) the inner-product between the rows of the matrix G and the rows of the matrix H approximates the covariances between the X-variables and the Y-variables. Here, the rows of G associates with the X-variables, while the rows of H associates with the Y-variables. Focusing on revealing the relationships between two sets of variables, X and Y, only axes will be present in the resulting biplot. However, two sets of axes are needed, a set for the X-variables and a set for the Y-variables. In other words, with the columns of XY S associated with the Y-variables, the rows of H approximate the covariance between these Y-variables.
Moreover, any β value between 0 and 1 will neither optimally approximate the covariance between the X-variables nor the covariance between the Y-variables, but rather, it will give an indication of both. Since choosing β closer to 0 better approximates the covariance between the Y-variables, the symmetric choice of It should be noted that since only variables are being represented in the covariance monoplot/biplot and there are no samples to (orthogonally) project onto the axes representing these variables, calibration markers are not necessary on these axes.

An Illustration
The following illustration of the covariance biplot is performed using the SOVR data from Umetrics MKS [15]. This mineral sorting production data shows the quality evaluation of five hundred and seventy two processes used to produce a final product. Twelve process factors were used in the evaluation, namely, total load (TON_IN), load of grinder 30 (KR30_IN), load of grinder 40 (KR40_IN), concentration mull (PARM), velocity of separator 1 (HS_1), velocity of separator 2 (HS_2), effect of grinder 30 (PKR_30), effect of grinder 40 (PKR_40), ore waste (GBA), load of separator 3 (TON_S3), waste from grinding (KRAV_F) and total waste (TOTAVF). The aim of this evaluation was to investigate the relationships between the process factors and the quality of the final product. Six output variables, amount of concentration type 1 (PAR), amount of concentration type 2 (FAR), distribution of concentration type 1 and 2 (r-FAR), iron in FAR (Percent_Fe_FAR), phosphor in FAR (Percent_P_FAR) and iron in raw ore (Percent_Fe_malm), were used to measure the quality of the final product. The processes are assigned as the samples, while the process factors and output variables are the predictor and response variables respectively. Thus, the SOVR data can be viewed as a 572×18 data matrix, comprising of an X:572×12 matrix and a Y:572×6 matrix. A copy of this data can be found on the dropbox link [reference] under the "Data Sets" folder [16].
As is customary for covariance analysis, both the X:572 12 × and Y:572 6 × matrices are centred before the analysis. In addition, each of these matrices is standardized by dividing each centred variable by their respectively standard deviation, as this facilitates the direct comparison of correlation values. The (twodimensional) covariance biplot for the data is shown in Figure  2, with 0.5. β = The respective G:5 2 × and H:6 2 × matrices are shown in Tables 1 & 2 respectively. The approximated covariance values are shown in Table 3. The representation of the variance of each variable, represented by the thicker arrow (vector) on each axis, is shown in the biplot displays (Figures 2  to 4). Observing the length of the thicker arrows (vectors) on the axes in Figure 1, output variables PAR and FAR can be said to have the largest standard deviation. However, variable Percent_ Fe_malm has the smallest standard deviation, compared to the others, followed by process factor _1 HS . These deductions can also be seen (clearly) in Figures 3 & 4.   Furthermore, the positions of the biplot axes give an indication of the correlations between the variables. Axes forming small angles are said to be strongly correlated -either positively or negatively. Axes are positively correlated when they lie in the same direction, while negatively correlated axes lie in opposite directions. Also, axes that are close to forming right angles are said to be uncorrelated. From Figure 2, various inter-variable relationships can be deduced, such as the relation between output variables FAR and _ r FAR and process factors TOTAVF, GBA, PKR_40, KR30_IN, TON_IN, KR40_IN, KRAV_F and PKR_30. Looking at the directions of these axes in the biplot, this relation is a positive one. Also, the relation between process factor HS_2 and output variables _ _ Percent P FAR and _ _ Percent Fe FAR can be seen. The relation between factor HS_2 and _ _ Percent P FAR is a negative one, while the relation between factor HS_2 and _ _ Percent Fe FAR is a positive one. However, process factor _ _ Percent Fe malm can be said to have no relation with the others.  Moreover, to illustrate how a covariance monoplot can help to reveal relationships within one set of variables, consider the monoplot of the process factors shown in Figure 3.  Table 4. Factors TON_ S3, HS_1 and HS_2 can be said to be unrelated to each other. Likewise, from the covariance monoplot of the output variables ( Figure 4), output variables _ r FAR , PAR and FAR can be said to be related, with an approximated (positive) correlation values of 0.675, 0.705 and 0.920 respectively as shown in Table 5. Also, a (negative) relation within variables _ _ Percent P FAR and _ _ Percent Fe FAR (−0.864) can be noted.

Conclusion
The covariance matrix of two sets of variables can be visualized graphically using the biplot. The resulting biplot, termed the covariance biplot, reveals variables relationships graphically. If only one set of variables is considered in the covariance analysis, the resulting graphical representation is a covariance monoplot. Advantages of a covariance biplot include the revelation of the relationships between two sets of variables as well as within each set.