A Review on the Permutation Tests
Hasan Önder1* and Zeynel Cebeci2
1Department of Animal Science, Ondokuz Mayis University Samsun, Turkey
2Department of Animal Science, Çukurova University Adana, Turkey
Submission: August 31, 2017; Published: October 12, 2017
*Corresponding author: Ondokuz Mayis University, Agricultural Faculty, Department of Animal Science, 55139, Samsun, Turkey; Email: hasanonder@gmail.com
How to cite this article: Hasan Ö, Zeynel C. A Review on the Permutation Tests. Biostat Biometrics Open Acc J. 2017;3(3): 555613. DOI: 10.19080/BBOAJ.2017.03.555613
Abstract
In this review we describe the permutation test which is successful in many cases where parametric tests are not because of its independency from the distribution. Some properties and usage fields of permutation tests were reviewed for experimental researchers to lead them having more reliable statistical results in their studies.
Keywords: Permutation tests; Resampling methods; Exact test
Introduction
The first description of permutation tests which is one of the resampling methods can be traced back to the works of Fisher [1] and Pitman [2] in the first half of the 20th Century. Permutation tests did not receive much attention until widespread use of powerful computers because of computationally intensive [3]. In practice parametric methods reflect a modeling approach and generally require the introduction of a set of stringent assumptions, which are often quite unrealistic, unclear, and difficult to justify [4]. Because of its independency from the distribution, permutation tests are successful in many cases where parametric tests are not. The assumptions of permutation tests are exchangeability of data for regression and relabelability of data for comparison statistics [5]. One of the most important features of the permutation testing principle is that in theory and under a set of mild conditions conditional inferences can be extended unconditionally to all distributions [4].
Calculation of exact P value can be demonstrated as; To calculate a P value, F value obtained from the original data is compared with the distribution of F* (F value for ith permutation) values obtained by permutation test. The empirical frequency distribution of F* is entirely exposed because the number of possible relabeling data is finite. Type I error rate for the null hypothesis is calculated by dividing the number of F* equals to or greater than F by total number of F; P=(number of F*≥F)/ (total number of permutation). This value provides the exact P value which mean that the type I error of the test is exactly equal to the a priori chosen significance level for the test [5-8].
This formulation indicated that total number of permutation should be at least 20, otherwise rejecting the null hypothesis is impossible. So the number of possible permutation is an important criterion. For regression total number of permutation is n!, for completely randomized design with t groups and n replications, the number of possible permutation is calculated by (tn)!/[t!(n!)t], for randomized block design with b blocks and t treatments, the number of possible permutation is calculated as (t!)b, for Latin square design it is calculated as t!(t-1)! [3]. It should be taking into account that half of the total number of possible permutation is enough to analyze the data because the distribution of permutation is symmetric.
Discussion
In many cases for biological studies the number of replication is not adequate to reach the normal distribution. In that situation permutation test can be used easily and researchers reach reliable results [5]. Also, permutation tests are suitable for high sample sized data sets such as c-DNA microarray data analysis where permutation test applied directly by the commercial software that it was asked to the users about ten years ago. Permutation tests gain popularity in genomic research for 20 years because straightforward way to obtain reliable statistical inference without making strong distributional assumptions [9]. If the experiment is too small, permutation analysis can be conducted by shuffling residual values across genes [10]. This can be useful for many genomic researches.
In experimental planning, factorial experiments are of particular interest as they allow us to separately examine the effects of two or more factors in all their possible combinations. In the usual linear model for the analysis of variance, if the error components are not normally distributed, parametric analysis may not be appropriate [11]. Since in factorial designs the elements of the response are not exchangeable, a restricted kind of permutations can be used. The synchronized permutations allow testing for main effects and interaction together with the restricted permutation test statistics [3,8,11]. Another statistical problem is comparing means or medians by non-parametric post-hoc such as Dunnett test but many researchers argued that pairwise permutation tests more robust than Dunnett's procedures [12,13].
In simple or multiple regression analysis both significance of the model and goodness of fit can be calculated as exact by using permutation tests [3,6,14,15]. Permutation tests also used to calculate variance components for generalized linear mixed models to multilevel data which are commonly collected in studies in the medical, social, and behavioral sciences [16]. It is the most important factor to recommend the use of permutation tests that it equalize the technical error, one of the components of error term, to zero and only treatment error remained in the error term. It is well known that data taken from biological studies generally do not satisfy the assumption of the parametric methods. If data does not receive assumptions or structure of the data is not known, permutation tests can be performed to obtain more reliable results. Otherwise, the statistical decision may lead to misinterpretation of the results because of making Type I error for the hypothesis. Permutation tests and parametric methods yield similar results when the data fulfill the necessary assumptions for the parametric method. Permutation test provides exact type I and II error rates and test power, whereas, the parametric tests provide only an approach for them. In this case use of parametric methods can be preferred in terms of computer time and simple calculation effort [3,15].
Conclusion
As a conclusion permutation tests can be used with great reliability for many situations such as comparison, clustering, discriminating, regression, correlation, estimation of variance component, heritability calculation and so on. There is lots of free software for this purpose such as NPMANOVA, DISTLM and R. Fisher's recognize of permutation tests made statistic more reliable with improved computer power for experimental data.
References
- Fisher R A (1935) The Design of Experiments. Oliver and Body, Edinburgh, UK.
- Pitman EJG (1937) Significance Tests Which May be Applied to Samples from Any Population. Royal Stastical Society Supplement Part I 4(1): 119-130.
- Onder H, Cebeci Z (2009) Use and Comparison of Permutation Tests in Linear Models. Anadolu J Agric Sci 24(2): 93-97.
- Pesarin F, Salmaso L (2010) Permutation Tests for Complex Data: Theory, Applications, and Software, John Wiley & Sons, Ltd., West Sussex, UK.
- Onder H (2007) Using Permutation Tests to Reduce Type I and II Errors for Small Ruminant Research. J Appl Anim Res 32(1): 69-72.
- Anderson M J (2001) Permutation Tests for Univariate or Multivariate Analysis of Variance and Regression. Can J Fish Aquat Sci 58(3): 626639.
- Anderson M J, Robinson J (2001) Permutation Tests for Linear Models. Aust NZJ Stat 43(1): 75-88.
- Anderson M J, Ter Braak CJF (2003) Permutation Tests for MultiFactorial Analysis of Variance. Journal of Statistical Computation and Simulation 73(2): 85-113.
- Buzkova P, Lumley T, Rice K (2011) Permutation and Parametric Bootstrap Tests for Gene-Gene and Gene-Environment Interactions. Annals of Human Genetics 75(1): 36-45.
- Cui X, Churchill GA (2003) Statistical Tests for Differential Expression in cDNA Microarray Experiments. Genome Biol 4(4): 210-220.
- Basso D, Chiarandini M, Salmaso L (2007) Synchronized Permutation Tests in Replicated IxJ Designs. Journal of Statistical Planning and Inference 137: 2564-2578.
- Richter S J, McCann M H (2013) Simultaneous Multiple Comparisons with a Control Using Median Differences and Permutation Tests. Statistics & Probability Letters 83(4): 1167-1173.
- Tirink C, Onder H (2015) Comparing the Nonparametric Permutation Test and Dunnett Multiple Comparasion Test, The 8th Conference of Eastern Mediterranean Region of International Biometric Society, EMR 2015 Abstract Book, May 2015, Cappadocia, Nevsehir, Turkey, 103: 1115.
- Rose K A, Smith E P (1998) Statistical Assessment of Model Goodness- of-Fit Using Permutation Tests. Ecological Modelling 106(2-3): 129139.
- Onder H (2008) A Comparative Study of Permutation Tests with Euclidean and Bray-Curtis Distances for Common Agricultural Distributions in Regression. J Appl Anim Res 34(2): 133-136.
- Fitzmaurice GM, Lipsitz SR, Ibrahim JG (2007) A Note on Permutation Tests for Variance Components in Multilevel Generalized Linear Mixed Models. Biometrics 63(1): 942-946.