Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
81 tokens/sec
Gemini 2.5 Pro Premium
47 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
88 tokens/sec
DeepSeek R1 via Azure Premium
79 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
192 tokens/sec
2000 character limit reached

Large-scale metric objects filtering for binary classification with application to abnormal brain connectivity detection (2403.12624v1)

Published 19 Mar 2024 in stat.ME and stat.AP

Abstract: The classification of random objects within metric spaces without a vector structure has attracted increasing attention. However, the complexity inherent in such non-Euclidean data often restricts existing models to handle only a limited number of features, leaving a gap in real-world applications. To address this, we propose a data-adaptive filtering procedure to identify informative features from large-scale random objects, leveraging a novel Kolmogorov-Smirnov-type statistic defined on the metric space. Our method, applicable to data in general metric spaces with binary labels, exhibits remarkable flexibility. It enjoys a model-free property, as its implementation does not rely on any specified classifier. Theoretically, it controls the false discovery rate while guaranteeing the sure screening property. Empirically, equipped with a Wasserstein metric, it demonstrates superior sample performance compared to Euclidean competitors. When applied to analyze a dataset on autism, our method identifies significant brain regions associated with the condition. Moreover, it reveals distinct interaction patterns among these regions between individuals with and without autism, achieved by filtering hundreds of thousands of covariance matrices representing various brain connectivities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Machine learning for neuroimaging with scikit-learn. Frontiers in neuroinformatics, 8:14, 2014.
  2. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM journal on matrix analysis and applications, 29(1):328–347, 2007.
  3. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
  4. Single index fréchet regression. The Annals of Statistics, 51(4):1770–1798, 2023.
  5. Geometry of the space of phylogenetic trees. Advances in Applied Mathematics, 27(4):733–767, 2001.
  6. Rates of convergence for nearest neighbor classification. Advances in Neural Information Processing Systems, 27, 2014.
  7. Wasserstein regression. Journal of the American Statistical Association, 118(542):869–882, 2023.
  8. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967.
  9. Towards automated analysis of connectomes: The Configurable Pipeline for the Analysis of Connectomes (C-PAC). Frontiers in Neuroinformatics, (42), 2013. ISSN 1662-5196.
  10. Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510):630–641, 2015.
  11. Identifying brain areas correlated with ados raw scores by studying altered dynamic functional connectivity patterns. Medical Image Analysis, 68:101899, 2021.
  12. An automated labeling system for subdividing the human cerebral cortex on mri scans into gyral based regions of interest. Neuroimage, 31(3):968–980, 2006.
  13. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry, 19(6):659–667, 2014.
  14. Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. The Annals of Applied Statistics, 3(3):1102 – 1123, 2009. doi: 10.1214/09-AOAS249. URL https://doi.org/10.1214/09-AOAS249.
  15. Functional models for time-varying random objects. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(2):275–327, 2020.
  16. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, pages 642–669, 1956.
  17. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5):849–911, 2008.
  18. Sure independence screening. Wiley StatsRef: Statistics Reference Online, 2018.
  19. Sure independence screening in generalized linear models with np-dimensionality. The Annals of Statistics, 38(6):3567–3604, 2010.
  20. On the rate of convergence in wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3):707–738, 2015.
  21. Karl J Friston. Functional and effective connectivity: a review. Brain connectivity, 1(1):13–36, 2011.
  22. Efficient classification for metric data. IEEE Transactions on Information Theory, 60(9):5750–5759, 2014.
  23. Threshold selection in feature screening for error rate control. Journal of the American Statistical Association, 118(543):1773–1785, 2023.
  24. Characteristics of brains in autism spectrum disorder: structure, function and connectivity across the lifespan. Experimental neurobiology, 24(4):273, 2015.
  25. A generalized banach-mazur theorem. Bulletin of The Australian Mathematical Society, 1(2):169–173, 1969.
  26. Active nearest-neighbor learning in metric spaces. Journal of Machine Learning Research, 18(195):1–38, 2018.
  27. Jing Lei. Convergence and concentration of empirical measures under wasserstein distance in unbounded functional spaces. Bernoulli, 26(1):767–798, 2020.
  28. The grassmannian of affine subspaces. Foundations of Computational Mathematics, 21:537–574, 2021.
  29. Logistic regression and classification with non-euclidean covariates. arXiv preprint arXiv:2302.11746, 2023.
  30. Zhenhua Lin. Riemannian geometry of symmetric positive definite matrices via cholesky decomposition. SIAM Journal on Matrix Analysis and Applications, 40(4):1353–1370, 2019.
  31. The kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika, 100(1):229–234, 2013.
  32. Directional statistics, volume 2. Wiley Online Library, 2000.
  33. Equivariant estimation of fr\\\backslash\’echet means. arXiv preprint arXiv:2104.03397, 2021.
  34. The genetics of autism. Pediatrics, 113(5):e472–e486, 2004.
  35. Underconnected, but how? a survey of functional connectivity mri studies in autism spectrum disorders. Cerebral cortex, 21(10):2233–2243, 2011.
  36. A generic sure independence screening procedure. Journal of the American Statistical Association, 2019.
  37. Fréchet regression for random objects with euclidean predictors. 2019.
  38. Altered structural brain asymmetry in autism spectrum disorder in a study of 54 datasets. Nature communications, 10(1):4958, 2019.
  39. Connectivity in autism: a review of mri connectivity studies. Harvard review of psychiatry, 23(4):223–244, 2015.
  40. Ulrike von Luxburg and Olivier Bousquet. Distance-based classification with lipschitz functions. J. Mach. Learn. Res., 5(Jun):669–695, 2004.
  41. Nonparametric statistical inference via metric distribution function in metric spaces. arXiv e-prints, pages arXiv–2107, 2021.
  42. Nonparametric statistical inference via metric distribution function in metric spaces. Journal of the American Statistical Association, pages 1–13, 2023.
  43. Fréchet sufficient dimension reduction for random objects. Biometrika, 109(4):975–992, 2022.
  44. Dimension reduction for fréchet regression. Journal of the American Statistical Association, (just-accepted):1–27, 2023.
  45. Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106(496):1464–1475, 2011.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com