Measure of Strength of Evidence for Visually Observed Differences between Subpopulations
Abstract: For measuring the strength of visually-observed subpopulation differences, the Population Difference Criterion is proposed to assess the statistical significance of visually observed subpopulation differences. It addresses the following challenges: in high-dimensional contexts, distributional models can be dubious; in high-signal contexts, conventional permutation tests give poor pairwise comparisons. We also make two other contributions: Based on a careful analysis we find that a balanced permutation approach is more powerful in high-signal contexts than conventional permutations. Another contribution is the quantification of uncertainty due to permutation variation via a bootstrap confidence interval. The practical usefulness of these ideas is illustrated in the comparison of subpopulations of modern cancer data.
- Support-vector networks. Machine learning, 20(3):273–297.
- Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3):427–444.
- Exact testing with random permutations. Test, 27(4):811–825.
- Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2):291–304.
- The cancer genome atlas: creating lasting value beyond its data. Cell, 173(2):283–285.
- Continuous multivariate distributions, volume 7. Wiley New York.
- Jolliffe, I. T. (1986). Principal components in regression analysis. In Principal component analysis, pages 129–155. Springer.
- A generalization of laguerre polynomials. SIAM Journal on Mathematical Analysis, 24(3):768–782.
- Fast algorithms for large-scale generalized distance weighted discrimination. Journal of Computational and Graphical Statistics, 27(2):368–379.
- The folded normal distribution. Technometrics, 3(4):543–550.
- Distance-weighted discrimination. Journal of the American Statistical Association, 102(480):1267–1271.
- Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics bulletin, 2(6):110–114.
- Properties of balanced permutations. Journal of Computational Biology, 16(4):625–638.
- Tukey, J. W. (1976). Exploratory data analysis. 1977. Massachusetts: Addison-Wesley.
- Kernel smoothing. Chapman and Hall/CRC.
- Direction-projection-permutation for high-dimensional hypothesis tests. Journal of Computational and Graphical Statistics, 25(2):549–569.
- Yang, X. (2021). Machine Learning Methods in Hdlss Settings. PhD thesis, The University of North Carolina at Chapel Hill.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.