Permutation Statistical Methods: Calculation Efficiencies

Permutation statistical methods were initially developed by RA Fisher [1], RC Geary [2], T Eden & F Yates [3], EJG Pitman [4-6] and other mathematicians and scientists in the 1920s and 1930s for validating the normality and homogeneity assumptions of classical statistical methods. Subsequently, permutation statistical methods have emerged as an approach to data analysis in their own right. Permutation statistical methods possess several advantages over classical statistical methods. First, permutation tests are entirely data-dependent in that all the information required for analysis is contained within the observed data. Second, permutation tests are appropriate for non-random samples, such as are common in many fields of research. Third, permutation tests are distribution-free in that they do not depend on the assumptions associated with traditional parametric tests. Fourth, permutation tests provide exact probability values based on the discrete permutation distribution of equally-likely test statistic values. Fifth, permutation tests are ideal for small data sets.


Introduction
Permutation statistical methods were initially developed by RA Fisher [1], RC Geary [2], T Eden & F Yates [3], EJG Pitman [4][5][6] and other mathematicians and scientists in the 1920s and 1930s for validating the normality and homogeneity assumptions of classical statistical methods. Subsequently, permutation statistical methods have emerged as an approach to data analysis in their own right. Permutation statistical methods possess several advantages over classical statistical methods. First, permutation tests are entirely data-dependent in that all the information required for analysis is contained within the observed data. Second, permutation tests are appropriate for non-random samples, such as are common in many fields of research. Third, permutation tests are distribution-free in that they do not depend on the assumptions associated with traditional parametric tests. Fourth, permutation tests provide exact probability values based on the discrete permutation distribution of equally-likely test statistic values. Fifth, permutation tests are ideal for small data sets.
Two types of permutation tests are common in the statistical literature: exact and Monte Carlo resampling tests. Although the two types of permutation statistical tests are methodologically quite different, both types are based on the same specified null hypothesis. In an exact permutation statistical test, the first step is to calculate a test statistic value for the observed data. Second, all possible, equally-likely arrangements of the observed data are enumerated. Third, the desired test statistic is calculated for each arrangement of the observed data. The probability of obtaining the observed value of the test statistic, or one more extreme, is the proportion of the enumerated test statistics with values equal to or more extreme than the value of the observed test statistic. For large samples the total number of possible arrangements can be considerable and exact permutation methods are quickly rendered impractical. For example, permuting two small samples of sizes. is the proportion of the randomly selected test statistics with values equal to or more extreme than the value of the observed test statistic. With a sufficient number of random samples, a probability value can be computed to any reasonable accuracy. The current recommended practice is to use randomly selected arrangements of the observed data to ensure a probability value 1, 000, 000 L = with three decimal places of accuracy [7].
While permutation statistical tests do not require random sampling, normality, homogeneity of variance, or large sample sizes, and are also completely data-dependent, a potential drawback is the amount of computation required, with exact permutation tests formerly being unrealistic for many statistical analyses. Even Monte Carlo resampling permutation tests often require the enumeration of millions of random arrangements of the observed data in order to guarantee sufficient accuracy. For many years, exact tests were considered to be impractical, but modern computers make it possible to generate hundreds of millions of permutations in just a few minutes. In addition, Monte Carlo permutation methods can be inefficient due to millions of calls to a pseudorandom number generator (PRNG).
Eight innovations mitigate this problem. Seven of the eight innovations pertain to both exact and Monte Carlo resampling permutation methods, while the eighth innovation is specific to Monte Carlo resampling methods. First, high-speed computing makes possible exact permutation statistical tests in which all possible arrangements of the observed data are generated and examined. Second, examination of all combinations of arrangements of the observed data, instead of all permutations, yields the identical exact probability values with considerable savings in computation time. Third, mathematical recursion with an arbitrary initial value greatly simplifies difficult computations, such as large factorial expressions. Fourth, calculation of only the variable components of the selected test statistic greatly simplifies calculation. Fifth, for certain classes of problems with a limited number of discrete values it is possible to substantially reduce the number of permutation necessary to accomplish an exact test. Sixth, for matched-pairs and other block designs the required number of arrangements of the observed data can be greatly reduced by holding one set of blocks constant. Seventh, the utilization of partition theory for goodness-of-fit tests can reduce the required number of permutations of the observed data. Eighth, Monte Carlo resampling methods can provide highly-accurate approximate probability values for samples of any size with substantial reduction in calculation. Combinations of any or all of the eight calculation efficiencies provide highly accurate permutation analyses that would otherwise be prohibitive.

High-speed computing
As Berry, Johnston, and Mielke observed in 2014, one has only to observe the hordes of the digitally distracted trying to navigate a crowded sidewalk with their various smart-phones, pads, pods, and tablets to realize that computing power, speed, and accessibility have finally arrived [8]. As Martin Hilbert documented in 2012, just one percent of the world's capacity to store information was in digital format, but by year 2000 digital represented 25 percent of the total world's memory [9]. The year 2002 marked the start of the digital age, as 2002 was the year that humankind first stored more information in digital than in analog form. By 2007 over 97 percent of the world's storage capacity was digital. Moreover, it was estimated in 2012 that ninety percent of the data stored in the world had been created in just the previous two years. Prior to 2001, data storage was measured in bytes, kilobytes ( ) 3 10 , and occasionally in megabytes ( ) 6 10 ; now data storage is measured in gigabytes ( ) 9 10 , terabytes (10 12 ), petabytes( ) 15 10 , exabytes ( ) 18 10 , zettabytes ( ) 21
In 2000, the Intel Pentium processor contained 42 million transistors and ran at 1.5GHz. In the spring of 2010, Intel released the Itanium processor, code-named Tukwila after a town in the state of Washington, containing 1.4 billion transistors and running at 2.53GHz. On 4 June 2013 Intel announced the Haswell processor, named after a small town of 65 people in southeastern Colorado with 1.4 billion 3-D chips and running at 3.50GHz. The latest generation of Haswell processors, the i7-4790 processor, currently executes at 4.00GHz with turbo-boost to 4.40GHz. In April of 2017, Intel introduced the Optane memory module which, when coupled with a seventh generation Intel Corebased system, has the potential to increase desktop computer performance by 28 percent.

Permutation methods
While not widely available to researchers, by 2010 mainframe computers were measuring computing speeds in teraflops. To emphasize the progress of computing, in 1951 the Remington Rand Corporation introduced the UNIVAC computer running at 1,905 flops, which with ten mercury delay line memory tanks could store 20,000 bytes of information; in 2008 the IBM Corporation supercomputer, code-named Roadrunner, reached a sustained performance of one petaflops -a quadrillion operations per second; in 2010 the Cray Jaguar was named the world's fastest computer performing at a sustained speed of unveiled the Blue Gene\P and \Q supercomputing processing systems that can achieve 20 petaflops. At the same time, IBM filed a patent for a massive supercomputing system capable of 107 petaflops.
From a more general perspective, in 1977 the Tandy Corporation released the TRS-80, the first fully assembled personal computer, distributed through Radio Shack stores. The TRS-80 had 4MB of RAM and ran at 1.78MHz. By way of comparison, in 2010 the Apple iPhone had 131,072 times the memory of the TRS-80 and was about 2,000 times faster, running at one GHz. In 2012, Sequoia, an IBM Blue Gene/Q supercomputer was installed at Lawrence Livermore National Laboratory (LLNL) in Livermore, California. In June of 2012 Sequoia officially became the most powerful supercomputer in the world. Sequoia is capable of 16.32 petaflops-more than 16 quadrillion calculations a second-which is 55 percent faster than Japan's K supercomputer, ranked number 2, and more than five times faster than China's Tianhe-1A, which was the fastest supercomputer in the world in 2010.
Finally, high-speed computers have dramatically changed the field of computational statistics. The future of high-speed computing appears very promising for exact and Monte Carlo resampling permutation statistical methods. Combined with other calculation efficiencies, it can safely be said that permutation methods have the potential to provide exact or resampling probability values in an efficient manner for a wide variety of statistical applications.

Analysis with combinations
Although permutation statistical methods are known by the attribution "permutation," they are, in fact, not based on all possible permutations of the observed data. Instead exact permutation methods are typically based on all possible combinations of the observed data. Conversely, a combination lock is not based on combinations of numbers or letters, but is instead based on all possible permutations of the numbers or letters. A simple example will illustrate. Consider N = 8 objects that are to be divided into two groups A and B, where The purpose is to compare differences between the two groups, such as a mean or median difference. Let the eight objects be designated { } , , , , , , , a b c d e f g h . For group A, the first object can be chosen in eight different ways, the second object in seven ways, the third in six ways, and the fourth object in five ways. Once these four members of Group A are chosen, the membership of Group B is fixed, since the remaining four objects are assigned to Group B.
Of the 8×7×6×5 =1,680 ways in which the four objects can be arranged for Group A, each individual quartet of objects will appear in a series of permutations. Thus, the quartet { . Thus, each distinctive quartet will appear in 24 ways among the 1,680 possible arrangements. Therefore, 1,680 divided by 24 yields 70 distinctive quartets that could be formed by dividing eight objects into two groups of four objects each. The number of quartets can conveniently be expressed as  x x − = − = + or greater in favor of Group A occurs only twice in the 35 possible arrangements, i.e., row 1 with mean difference +11.50 and row 2 with mean difference +9.00. If Table 2 were to be completed to form all 70 arrangements, a mean difference of 9.00 or greater would also occur twice. Thus, for a two-sided test the exact probability is P = 4/70 = 0.0571, and for a one-sided test the exact probability is P = 2/70 = 0.0286.

Mathematical recursion
Mathematical recursion, in a statistical context, is a process in which an initial probability value of a test statistic is calculated, then successive probability values are generated from the initial value by a recursive process. A recursive process is one in which items are defined in terms of items of similar kind. Using a recursive relation, a class of items can be constructed from one or a few initial values (a base) and a small number of relationships (rules). For example, given the base, 0 0 n > . The initial value need not be an actual probability value, but can be a completely arbitrary positive value by which the resultant relative probability values are adjusted for the initializing value at the conclusion of the recursion process.

A recursion example
Consider a 2 2 × contingency table using the notation in Table  3 all rows, and 11 12 21 22 N n n n n = + + + denotes the Table 3 frequency total. The probability value corresponding to any set of cell frequencies in a 2 2 × contingency table, 11 12 21 22 , , , , n n n n is the hypergeometric point probability value given by ! ! ! ! . ! ! ! ! ! n n N n n n n p n n n N n n n n  , , , , , p n n n N p n n n N f n Then, solving for ( ) 1  1  11  12  22  11  11  12  22  11 1  1 1 , , On 18 December 1934, R.A. Fisher presented an invited paper on statistical inference to the Royal Statistical Society, a paper that was subsequently published in the Journal of the Royal Statistical Society [10]. In this paper, Fisher described the logic of permutation statistical methods and provided an example based on 30 criminal same-sex twins from a study originally conducted by Dr. Johannes Lange, Chief Physician at the Munich-Schwabing Hospital in Schwabing, a northern suburb of Munich [11]. The Lange data analyzed by Fisher consisted of 13 pairs of monozygotic (identical) twins and 17 pairs of dizygotic (fraternal) twins. For each of the 30 pairs of twins, one twin was known to be a convict. The study considered whether the twin brother of the known convict was himself "convicted" or "not convicted," thus forming a 2 2 × contingency table with 12 "convicted" and 18 "not convicted" twins cross-classified by the 13 "monozygotic" and 17 "dizygotic" twins. The 2 2 × contingency table, as analyzed by Fisher, is presented in Table 4. Fisher determined the reference set of all possible arrangements of the four cell frequencies, given the observed marginal frequency totals. Since there is only one degree of freedom in Table 4, it was only necessary to determine the possible values for one cell. Consider cell n11 in the first row and first column of Table  4 with a frequency of 10. For the data in Table 4, there are  Table 4.
possible arrangements.
To begin a recursion procedure it is necessary to have an initial value; usually, a beginning probability value. However, it is not necessary to provide an actual probability value to initialize a recursion procedure. Any arbitrary positive value can serve as an initial value with a compensatory adjustment made at the conclusion of the recursion process. The recursion technique can be traced back at least to Lambert Adolphe Jacques Quetelet who used a recursion procedure to generate the binomial probability distribution with 0.  [12]. To illustrate the use of an arbitrary origin in a recursion procedure, consider the data on monozygotic and dizygotic twins in Table   4 and set relative probability value { } The desired exact hypergeometric point probability values are then obtained by dividing each relative probability value, In this manner, the entire analysis is conducted utilizing an arbitrary initial value and a recursion procedure, thereby eliminating all factorial expressions. When the number of potential contingency tables given by is large, 11 11 max( ) min( ) 1 M n n = − + the computational savings can be substantial.
Computing the discrepancies from proportionality equal to or greater than the observed cell frequency configuration in Table 4, the one-tailed hypergeometric probability value for 10, 11, and 12 convicted monozygotic twins is For the frequency data given in Table 4, a two-tailed hypergeometric probability value includes all hypergeometric point probability values equal to or less than the point.

Variable components of a test statistic
Under permutation, only the variable components of the test statistic need be calculated for each arrangement of the observed data. As this is often only a very small piece of the desired test statistic, calculations can often be reduced by several factors [13]. For example, in 1933 Thomas Eden and Frank Yates substantially reduced calculations in a randomizedblock analysis of Yeoman II wheat shoots by recognizing that the block and total sums of squares would be constant for all of their 1,000 random samples and, consequently, the value of z for each sample would be uniquely defined by the treatment (between) sum of squares, i.e., the treatment sum of squares was sufficient for a permutation test of a randomized-block analysis of variance [5]. To illustrate an analysis using only the variable components of a test statistic, consider the expression for a conventional two-sample test t , where 1 n and 2 n denote the sample sizes, x and denote the sample means for samples 1 and 2, respectively. In computing the permutation probability value of Student's two-sample t test, given the total of all response measurements Where, 1i x and 2i x denote the response measurements in samples 1 and 2, respectively, only the sum of the response measurements in the smaller of the two samples need be calculated, for each arrangement of the observed response measurements, i.e., Where, N is the number of bivariate measurements.
Where, for an r c × contingency table C denotes the number of pairs of objects that are ranked in one order on both variable x and y given by D denotes the number of pairs of objects that are ranked in one order on variable x and the reverse order on variable y given by x T denotes the number of pairs tied on variable x but not tied on variable y given by And y T denotes the number of pairs tied on variable y but not tied on variable x given by consider the frequency data given in Table 5, where 41 N = bivariate observations have been cross-classified into a 3 3 × ordered contingency table. For the frequency data given in Table  5, the number of concordant pairs following Eq. (42) is The number of discordant pairs following Eq. (43) is The number of pairs tied on variable x following Eq. (44) is Where, For the frequency data given in Table 5, there are only

Limited number of discrete values
Simple linear correlation with one predictor variable requires the enumeration of ! N possible arrangements of the observed data when N determining an exact probability value. When is large, the number of possible arrangements can be excessive. For example, with only arrangements to be enumerated and analyzed. In such cases, Monte Carlo resampling permutation methods can be utilized, in which a random sample of size L is drawn from the reference set of all M possible arrangements. The probability value is simply the proportion of sampled correlation coefficients equal to or more extreme than the observed correlation coefficient value. While the resulting probability value is approximate, when L is large the probability value can be highly accurate. In general, a sample of 1, 000, 000 L = random arrangements is sufficient to ensure three decimal place of accuracy. However, 100, 000, 000 L = random arrangements is required to ensure four decimal places of accuracy.
Under special circumstances, there is an alternative for determining an exact probability value. Occasionally, correlation data is gathered in which one of the variables has only a limited number of unique values. The limiting case occurs when, for example, y variable is a continuous interval-level variable and variable x is a simple dichotomy, such as in point-biserial correlation. However, there are other cases in which only a few distinct values appear in variable , e.g., three or four values. An exact analysis can then be conducted by treating the distinct values of variable x as groups and correlating the continuous variable with group membership, with considerable increase in calculation efficiency.
To illustrate the analysis technique, consider Two examples illustrate the computational efficiency of analyzing group structures. First, consider the correlation data listed in Table 6. The data in Table 6 consist of 12 N = subjects divided into two groups with seven Republicans (R) in group 1 and five Democrats (D) in group 2. The raw data are listed on the left side of Table 6 and the dummy-coded (0, 1) values are listed on the right side. A dummy code of 1 indicates that the subject is in group 1, i.e, is a Republican, and a dummy code of 0 indicates that the subject is not in group 1, i.e., is a Democrat. For the paired data listed in Table 1, For a second example, consider the correlation data listed in Table 7. The data in Table 7 consist of 12 N = subjects divided into three groups with four Republicans (R) in group 1, four Democrats (D) in group 2, and four independents (I) in group 3. The raw data is listed on the left side of Table 7 and the dummycoded (0, 1) values are listed on the right side. A dummy code of (1, 0) indicates that the subject is in group 1, i.e., is a Republican, a dummy code of (0, 1) indicates that the subject is not in group 1, but is in group 2, i.e., is a Democrat, and a dummy code of (0, 0) indicates that the subject is not in groups 1 or 2, but is in group 3, i.e., is an Independent.  For the paired data listed in Table 7 In this manner, when there is a limited number of distinct values in one or more variables, the number of possible arrangements of the observed data can be reduced substantially and an exact analysis conducted where an examination of all ! N arrangements would be prohibitive.

Fixed measurements
In matched-pairs and other block designs where the same, or matched, subjects are measured on two or more occasions, the number of possible arrangements of the observed data can become quite large. Holding one set of measurements constant, relative to the other set(s), can greatly reduce the number of arrangements to be analyzed. In 1939 Maurice Kendall and Bernard Babington Smith published an article in The Annals of Mathematical Statistics on "The problem of m rankings" in which they developed the well-known coefficient of concordance [16]. Let N and m denote the number of rank scores and the number of judges, respectively, then Kendall and Babington Smith defined the coefficient of concordance as ( ) arrangements to be considered.

Example
To illustrate Kendall and Babington Smith's coefficient of concordance, consider the rank data listed in Table 8 with 8 N = objects and 3 m = sets of rankings. For the rank scores listed in Table 8, the sum of the squared rank scores is   Table 8 [4]. In this section goodness-of-fit tests for unordered equiprobable categories are described and compared. Included in this section are Fisher's exact test, exact chi-squared, exact likelihood-ratio, exact Freeman-Tukey, and exact Cressie-Read goodness-of-fit tests for k disjoint, unordered categories with equal probabilities under the null hypothesis. Exact tests are free from any asymptotic assumptions; consequently, they are ideal for sparse tables where expected values may be small.
Consider the random assignment of N objects to k unordered, mutually-exclusive, exhaustive, equiprobable categories, i.e., the probability for each of the k categories is Fisher's exact goodness-of-fit test is the sum of all distinct ( ) and Wilks' likelihood-ratio test statistic is given by Where, the expected frequency of the th i category under the null hypothesis of equal category probabilities is given by Two other tests that have received attention are the Freeman-Tukey [9] goodness-of-fit test given by and the Cressie-Read [4] goodness-of-fit test given by Cressie and Read showed that ( ) I λ with λ set to 2/3 was optimal both in terms of attained significance level and small sample properties. In general, for an exact goodness-of-fit test with N objects in k categories there are The null hypothesis specifies that the k expected category probabilities are equally likely, i.e., arrangements to be examined. The use of partitions based on ( ) p N with equal category probabilities is highly efficient when compared with the calculation of test statistic values based on all M possible configurations.

Monte carlo resampling
When exact permutation procedures become unmanageable due to the large number of possible arrangements of the observed data, a random sample of the M possible arrangements can be substituted, providing approximate, but highly accurate, probability values. Resampling permutation tests generate and examine a Monte Carlo random sample of all possible, equallylikely arrangements of the observed response measurements. For each randomly selected arrangement of the observed data, the desired test statistic is calculated. The probability of obtaining the observed value of the test statistic, or one more extreme, is the proportion of the randomly selected test statistics with values equal to or more extreme than the value of the observed test statistic. With a sufficient number of random samples, a probability value can be computed to any reasonable accuracy. While exact permutation methods are restricted to relatively small samples, given the large number of possible arrangements of the observed data, Monte Carlo resampling permutation tests are not limited by the sizes of the samples.
Monte Carlo resampling permutation tests have been shown to provide good approximations to exact probability values as a function of the number of random arrangements considered. The number of random arrangements suggested in books and articles is varied and likely dated due to previous limitations of computer speed and memory. Some authors have proposed as few as 100 random arrangements to as many as 10,000. For many years, 10,000 random arrangements was the most common number cited and the default value in many commercial resampling programs. More recently, 1,000,000 random arrangements appears to be the number most in use [13]. To illustrate the number of random arrangements required to yield a predetermined number of decimal places of accuracy, consider the data listed in Table 11. The data listed in Table 11 are adapted from Berry, Mielke, and Mielke and represent soil lead (Pb) quantities from two school districts in metropolitan New Orleans. There are possible arrangements of the soil lead data listed in Table 11 to be considered. Table 12 summarizes the results for eight different resamplings (L) of the data listed in Table 11 and the associated two-sided probability values. Each of the probability values was generated using a common seed and the same pseudorandom number generator. The last row of Table 12 contains the exact probability value based on all 137,846,528,820 M = possible arrangements of the soil lead data listed in Table 11. Given the results of the Monte Carlo resampling probability analyses listed in Table 12, 1, 000, 000 L = appears adequate when three decimal places of accuracy are required. Little improvement is achieved with increasing numbers of permutations. 1, 000, 000 L = provides a combination of accuracy and practicality. Modern desktop computers can generate 1,000,000 random arrangements in just a few seconds. For a second example analysis of a Monte Carlo resampling analysis consider the frequency data given in Table 13 where 72 N = bivariate observations have been crossclassified into a 3 5 × ordered contingency table. To calculate an exact probability value for a contingency table it is necessary to exhaustively enumerate all possible arrangements of cell frequencies, given the observed marginal distributions, calculate the specified test statistic on each arrangement, compute the hypergeometric point probability value for each arrangement of the observed data, and sum the hypergeometric point probability values associated with those test statistic values that are equal to or more extreme than the test statistic value calculated on the observed contingency table. For the data given in Table  13, there are  Where, O τ denotes the observed value of b τ .While an exact permutation analysis is not practical for the frequency data given in Table 1 A Monte Carlo resampling analysis not only examines a small fraction of the M possible arrangements of the observed data but, as in this example analysis, eliminates the calculation of 70,148,145 hypergeometric point probability values for a substantial increase in calculation efficiency.

Discussion
Eight improvements to permutation statistical methods are described to improve calculation efficiency. First, high-speed computers can quickly generate a large number of arrangements of the observed data, be these all possible arrangements or a large number of random arrangements. Second, substitution of all possible combinations for permutations yields the same probability values with a great reduction in computing time. Third, mathematical recursion greatly simplifies difficult calculations. Fourth, analysis of only the variable components of a specified test statistic eliminates many non-essential calculations. Fifth, when one variable contains only a small number of different values, substantial savings can be achieved by treating the different values as groups with no loss of precision. Sixth, For certain classes of problems, e.g., block designs, fixing one of the variables relative to the other variables can reduce the number of arrangements that need to be examined. Seventh, the use of partition theory in goodness-of-fit tests greatly reduces the number of required arrangements of the observed data. Eighth, Monte Carlo resampling methods can provide highlyaccurate approximate probability values for samples of any size with substantial reduction in calculation. The first seven are methods to improve calculation of exact probability values, but may also be used with Monte Carlo resampling values. While any one of the eight improvements may be used independently of the others, when used in combination with one or more of the others, vastly improved computation times can be achieved.