Papers
Topics
Authors
Recent
Search
2000 character limit reached

Measure of Strength of Evidence for Visually Observed Differences between Subpopulations

Published 2 Jan 2021 in stat.ME and stat.ML | (2101.00362v3)

Abstract: For measuring the strength of visually-observed subpopulation differences, the Population Difference Criterion is proposed to assess the statistical significance of visually observed subpopulation differences. It addresses the following challenges: in high-dimensional contexts, distributional models can be dubious; in high-signal contexts, conventional permutation tests give poor pairwise comparisons. We also make two other contributions: Based on a careful analysis we find that a balanced permutation approach is more powerful in high-signal contexts than conventional permutations. Another contribution is the quantification of uncertainty due to permutation variation via a bootstrap confidence interval. The practical usefulness of these ideas is illustrated in the comparison of subpopulations of modern cancer data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Support-vector networks. Machine learning, 20(3):273–297.
  2. Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(3):427–444.
  3. Exact testing with random permutations. Test, 27(4):811–825.
  4. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2):291–304.
  5. The cancer genome atlas: creating lasting value beyond its data. Cell, 173(2):283–285.
  6. Continuous multivariate distributions, volume 7. Wiley New York.
  7. Jolliffe, I. T. (1986). Principal components in regression analysis. In Principal component analysis, pages 129–155. Springer.
  8. A generalization of laguerre polynomials. SIAM Journal on Mathematical Analysis, 24(3):768–782.
  9. Fast algorithms for large-scale generalized distance weighted discrimination. Journal of Computational and Graphical Statistics, 27(2):368–379.
  10. The folded normal distribution. Technometrics, 3(4):543–550.
  11. Distance-weighted discrimination. Journal of the American Statistical Association, 102(480):1267–1271.
  12. Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics bulletin, 2(6):110–114.
  13. Properties of balanced permutations. Journal of Computational Biology, 16(4):625–638.
  14. Tukey, J. W. (1976). Exploratory data analysis. 1977. Massachusetts: Addison-Wesley.
  15. Kernel smoothing. Chapman and Hall/CRC.
  16. Direction-projection-permutation for high-dimensional hypothesis tests. Journal of Computational and Graphical Statistics, 25(2):549–569.
  17. Yang, X. (2021). Machine Learning Methods in Hdlss Settings. PhD thesis, The University of North Carolina at Chapel Hill.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.