Multilevel splitting for rare events estimation in permutation tests

Sukhov Vladimir D. , Korotkevich Gennady V., Sergushichev Alexey A.

2024 , VOLUME 24, NUMBER 4 ( may-june )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2024-24-4-654-660

Multilevel splitting for rare events estimation in permutation tests

V. D. Sukhov, G. V. Korotkevich, A. A. Sergushichev

Read the full article

Article in Russian

For citation:

Sukhov V.D., Korotkevich G.V., Sergushichev A.A. Multilevel splitting for rare events estimation in permutation tests. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 4, pp. 654–660 (in Russian). doi: 10.17586/2226-1494-2024-24-4-654-660

Abstract

Permutation tests are widely employed in statistical analysis, especially when the assumptions of parametric tests are violated, or the data distribution is unknown. However, classical permutation tests encounter challenges when attempting to estimate the probabilities of rare events with high relative accuracy, leading to difficulties in applying corrections for multiple hypothesis testing. In this study, we propose an original method for estimating arbitrarily small P-values in permutation tests, which is based on multilevel splitting for Monte Carlo method. The proposed method involves splitting the original permutation space into non-overlapping levels based on the statistic values. This approach allows the problem of estimating the original probability of a rare event to be reduced to estimating ordinary conditional probabilities for each level. Utilizing such an approach enables efficient estimation of the desired P-values while maintaining a balance between computation time and the level of relative error. The efficacy of the method is demonstrated in its application to the task of estimating arbitrary P-values in the two-sample Kolmogorov-Smirnov test. Comparing the method results with true P-values has shown practical convergence of the method. Moreover, examples of the superiority of the proposed method over alternative asymptotic approaches have been provided. Thus, the proposed method shows significant potential for application across a broad spectrum of scientific fields, such as systems biology, immunology, and others. Furthermore, the method can be adapted for use in various statistical analysis scenarios that require handling probabilities of rare events in permutation tests.

Keywords: statistical hypothesis test, P-value, Monte Carlo method, permutation test, rare events

References

Good P. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer Science & Business Media, 2013.
Pesarin F., Salmaso L. Permutation Tests for Complex Data: Theory, Applications and Software. John Wiley & Sons, 2010, 448 p.
Hammersley J. Monte Carlo Methods. Springer Science & Business Media, 2013, 178 p.
Kalos M.H., Whitlock P.A. Monte Carlo Methods. John Wiley & Sons, 2009, 215 p.
Trendelkamp-Schroer B., Noé F. Efficient estimation of rare-event kinetics. Physical Review X, 2016, vol. 6, no. 1, pp. 011009. https://doi.org/10.1103/physrevx.6.011009
Lestang T., Ragone F., Bréhier C.-E., Herbert C., Bouchet F. Computing return times or return periods with rare event algorithms. Journal of Statistical Mechanics: Theory and Experiment, 2018, vol. 2018, no. 4, pp. 043213. https://doi.org/10.1088/1742-5468/aab856
Caron V., Guyader A., Zuniga M.M., Tuffin B. Some recent results in rare event estimation. ESAIM: Proceedings, 2014, vol. 44, pp. 239–259. https://doi.org/10.1051/proc/201444015
L'Ecuyer P., Demers V., Tuffin B. Splitting for rare-event simulation. Proc. of the 2006 Winter Simulation Conference, 2006, pp. 137–148. https://doi.org/10.1109/wsc.2006.323046
Glasserman P., Heidelberger P., Shahabuddin P., Zajic T. Multilevel splitting for estimating rare event probabilities. Operations Research, 1999, vol. 47, no. 4, pp. 585–600. https://doi.org/10.1287/opre.47.4.585
Botev Z.I., Kroese D.P. An efficient algorithm for rare-event probability estimation, combinatorial optimization, and counting. Methodology and Computing in Applied Probability, 2008, vol. 10, no. 4, pp. 471–505. https://doi.org/10.1007/s11009-008-9073-7
Metropolis N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H., Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 1953, vol. 21, no. 6, pp. 1087–1092. https://doi.org/10.1063/1.1699114
Hastings W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970, vol. 57, no. 1, pp. 97–109. https://doi.org/10.2307/2334940
Kolmogorov A. Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari, 1933, vol. 4, pp. 83–91.
Smirnoff N. Sur les Écarts de la courbe de distribution empirique. Matematicheskii Sbornik, 1939, vol. 48, no. 1, pp. 3–26.
Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J. et. al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 2020, vol. 17, no. 3, pp. 261–272. https://doi.org/10.1038/s41592-019-0686-2

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License