Detection of the relationship between the variables is a great interest for all scientists. Different correlation coefficients or dependence measures have been developed and proposed for this purpose. In this study, a comprehensive simulation study has been conducted to compare the Spearman Rank, Kendall Tau, Distance, Percentage-Bend, and Hoeffding’s D measures alongside Pearson’s correlation under different types of relationships. Results of this study showed that the Spearman’s rho, Pearson’s r, and Percentage-Bend have found as the strongest correlations for measuring the monotonic relationships. On the other hand, if the relationship between the variables is curvilinear but non-monotonic, the [1,2] and Percentage-Bend correlations have been found to be close to zero. However, when the relation is curvilinear, but monotonic, the Spearman’s rho has found a stronger relationship (rho has a higher absolute value) than the others. For, non-monotonic nonlinear relations the Distance and  have found as the suit better.t
Keywords: Linear Relation; Monotonic Relations; Correlation; Simulation
Since the aim of many studies is to evaluate the relationships between/among variables, correlation analysis has become one of the basic statistical analyses for the researchers and scientists [3-6]. Correlation coefficient to be used for investigating the relation between the variables varies depending on the type of relationship between the variables. Although the linear relationships are very common, nonlinear, and monotonic relationships can also exist between the variables. It is also possible that there is no relationship between the variables for some cases. Therefore, it is extremely important to determine how the variables are related or what kinds of relations exist between the variables before computing correlation between two variables [7,8]. Because knowing the shape of the relationship between the variables enables us to determine the most appropriate correlation coefficient to be used to calculate the degree of the relationship between the variables reliably. Therefore, before starting to determine the degree of the relationship between two variables, first, it will be useful to determine how the variables are related. Scatter plot can be used for this purpose easily [7,9,10]. The main purpose of this study is to make detailed evaluations about monotonic relationships and determine the most appropriate correlation coefficient(s) for monotonic or non-monotonic. For this objective, a comprehensive Monte Carlo Simulation Study has been carried out to compare six different correlation coefficients under ten different relation types or scenarios.
Random numbers generated for bivariate normal variables X(0,1) and Y(0.,1)for sample size of 100 and true population correlations or effect sizes of ρ 0.0, 0.20, 0.60, 0.80, 0.90andρ=via Monte Carlo Simulation Technique consisted of the material of this study. After random numbers generated form the normal distribution, different definition or transformations have been applied to the generated data to construct different type of relationships (linear, non-linear, monotonic and non-monotonic). Created types of relationships have been presented in (Figures 1-10). Then, the [1,2], Percentage Bend, Distance, and  correlation coefficients have been computed for 10 different types of relationships. This process has been repeated 10000 times and average value for each correlation coefficient has been computed. All computations were performed by using [11,12].
Fi function in this model represents either linear or nonlinear types of the relationships depending on our purpose. For example, Fi function has been designed as Y=F(x)=X for generating linear relationship. Likewise, in generating other types of relations the Fi function defined as below:
Pearson’s correlation is a measure of linear relationship for two interval scaled variables. The Pearson correlation coefficient is the most used coefficient in practice and if the correlation coefficient is zero or close to zero, it indicates that there is no linear relationship between the two variables [5,6,9,13].
Since this correlation does not require assumptions of linearity and interval measurement levels of the variables, it is considered as nonparametric alternatives of the Pearson’s correlation. Basically, it is a measure of monotonic relationship between two variables. It can also be used for ordinal variables and less sensitive to outliers The assumptions of this coefficient are that as the data must be at least ordinal and the scores on one variable must be monotonically related to the other variable. The Spearman rank correlation can be computed by using following formulas:
The Kendall Tau or simply Kendall’s correlation is also a nonparametric alternative of the Pearson’s correlation. Since it also identifies monotonic relationships between the variables, it is commonly considered as an alternative correlation coefficient to the Spearman’s correlations [6,7,9,18] reported that any two pairs of rank
It is well known that the relations between the variables are not always linear. In some cases, relations between the variables are nonlinear. Therefore, a correlation coefficient that also provides to detect nonlinear relationships between variables is needed. For this purpose, distance correlation coefficient has been [22,6,7,23-25].
This correlation coefficient can be used as a measure of linear, monotonic, and non-monotonic relationships. Although the values of this correlation vary between –0.5 1to, the sign of this coefficient is not interpreted. It is because; the  also identifies non-monotonic relationships between the variables [7,26].
The percentage bend correlation (ρpb) is a robust alternative to Pearson’s correlation . Estimator of this correlation is both resistant and robust of efficiency. Although when the underlying data are bivariate normal, ρpb gives essentially the same values as Pearson’s correlation, but it is more robust in slightly changes in data that Pearson’s correlation. The ρpb belongs to class of correlation measures which protect against marginal distribution (X and Y) outliers. Therefore, this correlation is like [1,2] biweight midcovariance correlation. The percentage bend correlation between variables X and Y is computed as following computational steps given below:
The value of β is selected between 0 and 0.5. Higher values of β result in a higher breakdown point at the expense of lower efficiency [6,29-31].
Basically, three different types of relations namely linear, non-linear, and monotonic can be existing between the variables. In some cases, it is also possible to meet the situations where there is no relationship between the variables. Although almost all scientists and researchers in all branches of sciences (except for statisticians) are aware of linear and nonlinear relationships in general, minority of them have enough knowledge about the similarities / dissimilarities among the monotonic, linear, and nonlinear relations. Therefore, in this study, it has basically been focused on monotonic relations alongside linear and non-linear relations.
Linear relationship term is used for describing a straight-line relationship between two variables. When both variables increase or decrease concurrently and at a constant rate, a positive linear relationship exists while one variable increases the other one decreases, a negative linear relationship exists [32,33].
If the relationship between two variables is not linear, in this case, the rate of increase or decrease can vary as one variable changes . And this situation causes a curved pattern in the data set (Figures 2,3,6-9). In such cases, since the relation is not linear, this curved pattern might be better modeled by a nonlinear function.
A monotonic relationship indicates that as one variable increases the other one also increases or as one variable increases the other one decrease . However, this increase or decrease does not necessary to be at the same rate. Therefore, a monotonic relationship can be linear such that the rate of increase or decreases of both variables is the same. For instance, types of the relations in the (Figures 4,5) are good examples for monotonic and linear relations. Basic difference between monotonic and linear relationships is that in a monotonic relationship, the variables tend to move in the same relative direction, but not necessarily at a constant rate. In a linear relationship, the variables move in the same direction at a constant rate. A monotonic relationship can also be non-linear with an increase or decrease occurring at different rates between the two variables . For example, types of the relations in the (Figures 2,3,6,7) show both variables increasing / decreasing concurrently, but not at the same rate. Therefore, this relationship is monotonic, but not linear. An example can be given to explain this situation better. For instance, a drug may not be effective for a few days at the beginning of the treatment, but it may start to show its effect after a certain point. This is an example for nonlinear monotonic relationship. However, nonlinear relationships can also be non-monotonic. For example, a drug may become progressively more helpful for first two or three weeks, but then it may become harmful to the patients. Therefore, as shown in the (Figures 1,4,5) linear relations are monotonic, but as shown in the (Figures 2,3,6,7) not all monotone relationships are linear. There are two types of monotonic relations as positive (Figure 4) and negative (Figure 5) monotonic. When the value of one variable increases, the value of the other variable tends to increase as well, in this case a positive monotonic relation exists (Figure 4). There will be a negative monotonic relation if the value of one variable increases, the value of the other variable tends to decrease (Figure 5). If two variables don’t generally vary in the same direction, in this case there will be a non-monotonic relationship (Figures 8 & 9). Therefore, in non-monotonic relations, as X increases, Y sometimes increases and sometimes decreases.
Since, in real life, many scenarios show a monotonic relationship, it is an important issue to understand and aware of similarities and dissimilarities between monotonic, linear, and nonlinear relations. If such relationships are ignored, it is possible to make very strange decision. Because any process in nature or in our lifespan is a composition of monotonic functions. For example, changes in our weight, height, money etc to time show monotonic pattern. Therefore, these kinds of characteristics are monotonic functions of time.
Different correlation coefficients such as Pearson’s moment, Spearman’s rho, Kendall’s tau, Distance, Percentage-Bend, Hoeffding’s D, Maximal Correlation, Maximal Information Coefficient, and Mutual Information can be used in evaluating the relationships between two variables based on the types of relationships between the variables . As it is well known that the Pearson’s correlation is a measure of linear relationship between two interval scaled variables. However, the relationship between variables is not always linear. In many cases, the relationship is nonlinear or monotonic. Monotonic relationships can be both linear and non-linear. Therefore, it is needed a correlation coefficient which enables us to evaluate both linear and nonlinear relations. In this simulation study, it has mainly been focused on monotonic relations and it has been emphasized which correlation coefficient can be used in revealing monotonic relationships reliably.
IIn this study, random numbers were generated for 10 different types of relationships and six different correlation coefficients were applied to these data sets. The results have been presented in Table 1. When table 1 is examined, it has easily seen that except for the Distance and  none of the correlation coefficient recognized non-monotonic relations (Sc8 and Sc9) (Figures 8 & 9). Interestingly, all correlation coefficients have recognized non-linear monotonic relations (Sc2, Sc3, sc6, and Sc7) (Figures 2,3,6,7) with strongly in general. However, the Spearman’s rho seems to be the best one for these conditions. As is expected the Pearson’s r is the strongest correlation in recognizing linear relations and it is followed by Percentage-Bend and Distance correlations. The Spearman’s rho and Kendall’s tau recognized linear relations sufficiently as well. In terms of recognizing linear monotonic relationships (Sc4 and Sc5) (Figures 4 & 5) the spearman’s rho is the strongest correlation and followed by the Percentage-Bend and Pearson’s r correlations. It is possible to conclude that the results for the Distance and Kendall’s tau correlations are acceptable as well. The Hoeffding’s D measure, on the other hand, has not recognized the linear monotonic relation as satisfactory. In cases where there is no relationship between the variables, all correlations have given very satisfactory results. This means that no matter which correlation coefficient is used in these conditions, it can truly recognize that there is no relationship between two variables (Table 1).
Investigating the association between two or more variables is often of interest in practice by many researchers. For this purpose, different association measures or correlation coefficients have been developed. However, type of the relation between the variables affects the correlation coefficient which will be used to investigate the degree of the relationship. At this point, knowing what kind of relationship exists between the variables is very important when evaluating relationship between two variables. It is because, although linear relationships are very common, the relationship between the variables can also be non-linear and monotonic. Therefore, before computing correlation between the variables, it is necessary to determine what kind of relationship(s) exists between the variables. For this purpose, creating a scatter plot to visualize the relationship between the variables is always a good idea. Because performing all raw data as scatter plot helps to give visual information about the type and direction of relationship exists between the variables.
The results of this study which was carried out by considering these issues suggested that if a positive or a negative monotonic relation existed between the variables the Spearman’s rho, Pearson’s r, and Percentage-Bend correlations can be effectively used in determining degree of the relationship. However, if a nonlinear or curvilinear, but monotonic relationship exists between the variables, in this case, the Spearman’s rho indicated a stronger relationship than the , Percentage-Bend, Distance, and . Therefore, these results can be accepted as an indicator that the Spearman’s rho correlation might be very useful especially when there is a nonlinear monotonic relationship between the variables. On the other hand, if there is a curvilinear, but non-monotonic relationship between the variables, the values of all correlation coefficients except for the Distance and  have found to be zero. Based on these findings, it is possible to conclude that the use of the Distance and the Hoeffding’s D is more appropriate than the other coefficients in cases where there are non-monotonic relationships. Therefore, if there is a curvilinear but non-monotonic relationship between the variables, the Distance and  correlations should be preferred in investigating the degree of relationship between the variables. In other words, for such cases, the usage of the Spearman’s, Pearson’s, Kendall’s tau, and Percentage-Bend correlations will not be good choice in investigating the degree of the relationship between two variables. If a general evaluation is made in fact the Spearman’s rho correlation was designed for the purpose of measuring how monotonic the relationship is. That is why, it is widely performed in order to determine how strong of a monotonic relationship exists between the variables and in what direction this relationship is. Although the Spearman’s rho is one the most used alternatives to the Pearson’s r correlation, it basically determines the strength and direction of the monotonic relationship between two variables rather than the strength and direction of the linear relationship between the variables. That is, if the scatter plot shows the relationship between the variables looks monotonic, in this case, performing the the Spearman’s rho correlation will be good choice because this will measure the strength and direction of the monotonic relationship. However, if the scatter plot shows the linear relationship between the variables, in this case, performing the Pearson’s r correlation will be good choice because this will measure the strength and direction of the linear relationship.
On the other hand, the Percentage-Bend and Distance correlations can also be performed efficiently for such cases.
However, since it is not always possible to be able to visually check whether there is a monotonic relationship between the variables; the Spearman’s rho correlation might be performed anyway for such cases. Very low values (close to 0) of Spearman’s Rho, Kendall’s tau, Pearson’s r, and Percentage-Bend correlations and very high values (close to 1) of  and Distance correlations will be one of the indicators that the relationship between the variable is non-monotonic. The fact that all correlation coefficients have very low values will indicate that there is a random relationship between the variables. In this case, it is understood that there is no linear, monotonic or non-monotonic relationship between the variables.  in his simulation study he compared performances of several different measures of dependence, including Pearson’s and Spearman’s correlation coefficients, and the distance correlation and he reported that both Pearson’s and Spearman’s correlation coefficients can be used to recognize non-monotonic dependence also when it is non-linear, but they do not find non-monotonic dependence if it is symmetric. He also reported that for monotonic types of dependence, Pearson’s correlation coefficient is the most powerful measure of dependence, regardless of the number of observations. In case of non-monotonic relationship and n<50, the distance correlation will be a good choice for measuring the dependence. Although the findings of our study are generally consistent with the findings of Rizzio’s study, it was also noticed that there were some differences between the two studies due to the differences in the experimental conditions.
It is possible to reach following conclusions based on results of this simulation study. Although Pearson’s r was developed for measuring linear relation between two variables, the usage of it might also be possible for all kinds of monotonic relations. Spearman’s rho is one of the well-known nonparametric alternatives to the Pearson’s correlation coefficient. But this coefficient suits better especially when the variables are not normally distributed or the relationship is non-linear, but monotonic. That is why; this correlation coefficient is one of the commonly used coefficients for measuring the monotonic relations. On the other hand, it has been observed that neither Pearson’s nor Spearman’s correlation coefficient is appropriate for non-monotonic relations. If the relationship between two variables is monotonic, but non-linear (nonlinear monotonic relationship), Spearman’s rho is the strongest one. However, especially the Percentage-Bend, Pearson’s r, and Distance correlations might also be used for measuring the degree of the relation. Kendall’s tau and  correlations might also be used after those correlations. If our data sets have non-monotonic or nonlinear relationships (i.e., if we have U or ∩-shape distribution in the scatter plots), in this case, since the Pearson’s r, Spearman’s rho, Percentage-Bend, and Kendall’s tau cannot recognize these kinds of relations. However, the Distance and the Hoeffding’s D measure can recognize these types of relations. Since the  measure is a little better than the Distance correlation, using the  measure will be the best choice for such cases. All findings show that a strong dependence measure or correlation coefficient is needed that might be able to use to measure degree of the relation between two variables regardless of the type of relationship. It is obvious that theoretical and comprehensive simulation studies are needed for this.
Chen PY, Popovich PM (2002) Correlation: Parametric and Nonparametric Measures. Series: Quantitative Applications in the Social Sciences Sage Publications Inc California USA.
Tuğran E, Kocak M, Mirtagioğlu H, Yiğit S, Mendes M (2015) A simulation-based comparison of correlation coefficients with regard to type I error rate and power. J Data Analysis Info Processing 3(03): 87-101.
Temizhan E, Mirtagioğlu H, Mendeş M (2022) Which Correlation Coefficient Should Be Used for Investigating Relations between Quantitative Variables? Ame Academic Scienti Res J EngTechno Sci 85(1): 265-277.