Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TriSig: Assessing the statistical significance of triclusters (2306.00643v2)

Published 1 Jun 2023 in cs.LG and stat.ME

Abstract: Tensor data analysis allows researchers to uncover novel patterns and relationships that cannot be obtained from matrix data alone. The information inferred from the patterns provides valuable insights into disease progression, bioproduction processes, weather fluctuations, and group dynamics. However, spurious and redundant patterns hamper this process. This work aims at proposing a statistical frame to assess the probability of patterns in tensor data to deviate from null expectations, extending well-established principles for assessing the statistical significance of patterns in matrix data. A comprehensive discussion on binomial testing for false positive discoveries is entailed at the light of: variable dependencies, temporal dependencies and misalignments, and \textit{p}-value corrections under the Benjamini-Hochberg procedure. Results gathered from the application of state-of-the-art triclustering algorithms over distinct real-world case studies in biochemical and biotechnological domains confer validity to the proposed statistical frame while revealing vulnerabilities of some triclustering searches. The proposed assessment can be incorporated into existing triclustering algorithms to mitigate false positive/spurious discoveries and further prune the search space, reducing their computational complexity. Availability: The code is freely available at https://github.com/JupitersMight/TriSig under the MIT license.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. The bifold triadic relationships framework: A theoretical primer for advertising research in the metaverse. Journal of Advertising 51, 592–607.
  2. Musical dynamics in early triadic interactions: A case study. Psychological Research 84, 1555–1571.
  3. Mining pre-surgical patterns able to discriminate post-surgical outcomes in the oncological domain. IEEE Journal of Biomedical and Health Informatics 25, 2421–2434.
  4. Generating a seismogenic source zone model for the pyrenees: A gis-assisted triclustering approach. Computers & Geosciences 150, 104736.
  5. Gene ontology: tool for the unification of biology. Nature genetics 25, 25–29.
  6. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 289–300.
  7. Coexpression and coregulation analysis of time-series gene expression data in estrogen-induced breast cancer cell. Algorithms for molecular biology 8, 1–11.
  8. Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes. BMC bioinformatics 16, 1–19.
  9. Trirnsc: triclustering of gene expression microarray data using restricted neighbourhood search. IET Systems Biology 14, 323–333.
  10. Gene expression across mammalian organ development. Nature 571, 505–509.
  11. Multiclass microarray gene expression analysis based on mutual dependency models, in: IAPR International Conference on Pattern Recognition in Bioinformatics, Springer. pp. 46–55.
  12. Tri-clustering analysis for dissecting epigenetic patterns across multiple cancer types, in: International Conference on Intelligent Computing, Springer. pp. 330–336.
  13. Gaining insight in social networks with biclustering and triclustering, in: International conference on business informatics research, Springer. pp. 162–171.
  14. Greedy modifications of oac-triclustering algorithm. Procedia Computer Science 31, 1116–1123.
  15. Modern day monitoring and control challenges outlined on an industrial-scale benchmark fermentation process. Computers & Chemical Engineering 130, 106471.
  16. The development of an industrial-scale fed-batch fermentation simulation. Journal of biotechnology 193, 70–82.
  17. e-bimotif: Combining sequence alignment and biclustering to unravel structured motifs, in: Advances in Bioinformatics: 4th International Workshop on Practical Applications of Computational Biology and Bioinformatics 2010 (IWPACBB 2010), Springer. pp. 181–191.
  18. The strength and pattern of natural selection on gene expression in rice. Nature 578, 572–576.
  19. Discovering patterns in time-varying graphs: a triclustering approach. Advances in Data Analysis and Classification 12, 509–536.
  20. Bicpams: software for biological data analysis with pattern-based biclustering. BMC bioinformatics 18, 1–16.
  21. Bsig: evaluating the statistical significance of biclustering solutions. Data Mining and Knowledge Discovery 32, 124–161.
  22. Triclustering algorithms for three-dimensional data analysis: a comprehensive survey. ACM Computing Surveys (CSUR) 51, 1–43.
  23. Triadic formal concept analysis and triclustering: searching for optimal patterns. Machine Learning 101, 271–302.
  24. Contribution towards a metabolite profile of the detoxification of benzoic acid through glycine conjugation: an intervention study. Plos one 11, e0167309.
  25. Trias–an algorithm for mining iceberg tri-lattices, in: Sixth International Conference on Data Mining (ICDM’06), IEEE. pp. 907–911.
  26. Towards social artificial intelligence: Nonverbal social signal prediction in a triadic interaction, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10873–10883.
  27. An introduction to latent class growth analysis and growth mixture modeling. Social and personality psychology compass 2, 302–317.
  28. ” per cell” normalization method for mrna measurement by quantitative pcr and microarrays. BMC genomics 7, 1–14.
  29. Generalized gene expression programming models for estimating reference evapotranspiration through cross-station assessment and exogenous data supply. Environmental Science and Pollution Research 28, 6520–6532.
  30. Long-range temporal coordination of gene expression in synthetic microbial consortia. Nature chemical biology 15, 1102–1109.
  31. Southern ocean cloud and aerosol data: a compilation of measurements from the 2018 southern ocean ross sea marine ecosystems and environment voyage. Earth System Science Data 13, 3115–3153.
  32. Inter-embryo gene expression variability recapitulates the hourglass pattern of evo-devo. BMC biology 18, 1–12.
  33. G-tric: generating three-way synthetic datasets with triclustering solutions. BMC bioinformatics 22, 1–28.
  34. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics 1, 24–45.
  35. Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 153–165.
  36. Poptric: Pathway-based order preserving triclustering for gene sample time data analysis. Expert Systems with Applications 192, 116336.
  37. A new big data triclustering approach for extracting three-dimensional patterns in precision agriculture. Neurocomputing .
  38. Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering, in: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 533–541.
  39. Group-based trajectory modeling: an overview. Handbook of quantitative criminology , 53–67.
  40. Mixture models: Latent profile and latent class analysis. Modern statistical methods for HCI , 275–287.
  41. Tri-clustering dynamic functional network connectivity identifies significant schizophrenia effects across multiple states in distinct subgroups of individuals. Brain Connectivity 12, 61–73.
  42. Discovering correlated subspace clusters in 3d continuous-valued data, in: 2010 IEEE International Conference on Data Mining, IEEE. pp. 471–480.
  43. Towards triclustering-based classification of three-way clinical data: A case study on predicting non-invasive ventilation in als, in: Practical Applications of Computational Biology & Bioinformatics, 14th International Conference (PACBB 2020) 14, Springer. pp. 112–122.
  44. Towards triclustering-based classification of three-way clinical data: A case study on predicting non-invasive ventilation in als, in: International Conference on Practical Applications of Computational Biology & Bioinformatics, Springer. pp. 112–122.
  45. Triadic closure, homophily, and reciprocation: an empirical investigation of social ties between content providers. Information Systems Research 30, 912–926.
  46. Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290.
  47. Test for triadic closure and triadic protection in temporal relational event data. Social Network Analysis and Mining 10, 1–12.
  48. Reference gene optimization for circadian gene expression analysis in human adipose tissue. Journal of biological rhythms 35, 84–97.
  49. Modelling triadic relationships in families of children with intellectual disability. Journal of Applied Research in Intellectual Disabilities 35, 843–855.
  50. Longitudinal data analysis. Oxford handbook of quantitative methods 2, 387–410.
  51. An overview of clustering methods for geo-referenced time series: From one-way clustering to co-and tri-clustering. International journal of geographical information science 34, 1822–1848.
  52. Analysis of the circadian regulation of cancer hallmarks by a cross-platform study of colorectal cancer time-series data reveals an association with genes involved in huntington’s disease. Cancers 12, 963.
Citations (2)

Summary

We haven't generated a summary for this paper yet.