Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

mdendro: An R package for extended agglomerative hierarchical clustering (2309.13333v3)

Published 23 Sep 2023 in cs.IR, physics.data-an, and stat.CO

Abstract: "mdendro" is an R package that provides a comprehensive collection of linkage methods for agglomerative hierarchical clustering on a matrix of proximity data (distances or similarities), returning a multifurcated dendrogram or multidendrogram. Multidendrograms can group more than two clusters at the same time, solving the nonuniqueness problem that arises when there are ties in the data. This problem causes that different binary dendrograms are possible depending both on the order of the input data and on the criterion used to break ties. Weighted and unweighted versions of the most common linkage methods are included in the package, which also implements two parametric linkage methods. In addition, package "mdendro" provides five descriptive measures to analyze the resulting dendrograms: cophenetic correlation coefficient, space distortion ratio, agglomeration coefficient, chaining coefficient and tree balance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. “Genetic Diversity of the Grapevine (Vitis Vinifera L.) Cultivars Most Utilized for Wine Production in Portugal.” Vitis, 46(3), 116. 10.5073/vitis.2007.46.116-119.
  2. “Iterative Cluster Analysis of Protein Interaction Data.” Bioinformatics, 21(3), 364–378. 10.1093/bioinformatics/bti021.
  3. “Multiple UPGMA and Neighbor-Joining Trees and the Performance of Some Computer Packages.” Molecular Biology and Evolution, 13(2), 309–313. 10.1093/oxfordjournals.molbev.a025590.
  4. “A Comparison of Two Approaches to Beta-Flexible Clustering.” Multivariate Behavioral Research, 27(3), 417–433. 10.1207/s15327906mbr2703_6.
  5. “\proglangJulia: A Fresh Approach to Numerical Computing.” SIAM Review, 59(1), 65–98. 10.1137/141000671.
  6. Fernández A, Gómez S (2008). “Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms.” Journal of Classification, 25(1), 43–65. 10.1007/s00357-008-9004-x.
  7. Fernández A, Gómez S (2020). ‘‘Versatile Linkage: A Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering.” Journal of Classification, 37(3), 584–597. 10.1007/s00357-019-09339-z.
  8. Galili T (2015). “\pkgdendextend: An \proglangR Package for Visualizing, Adjusting, and Comparing Trees of Hierarchical Clustering.” Bioinformatics, 31(22), 3718–3720. 10.1093/bioinformatics/btv428.
  9. Gómez S, Fernández A (2021). “Radatools 5.2: communities detection in complex networks and other tools.” URL https://deim.urv.cat/~sergio.gomez/radatools.php.
  10. Gordon A (1999). Classification. 2nd edition. Chapman & Hall/CRC.
  11. Hart G (1983). “The Occurrence of Multiple UPGMA Phenograms.” In J Felsenstein (ed.), Numerical Taxonomy, pp. 254–258. Springer Berlin Heidelberg. 10.1007/978-3-642-69024-2_30.
  12. IBM Corporation (2021). IBM \proglangSPSS Statistics Base 28. IBM Corporation, Armonk, NY, USA. URL https://www.ibm.com/docs/en/SSLVMB_28.0.0/pdf/IBM_SPSS_Statistics_Base.pdf.
  13. Lance G, Williams W (1966). “A Generalized Sorting Strategy for Computer Classifications.” Nature, 212, 218. 10.1038/212218a0.
  14. Lance G, Williams W (1967). “A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems.” The Computer Journal, 9(4), 373–380. 10.1093/comjnl/9.4.373.
  15. “Ties in Proximity and Clustering Compounds.” Journal of Chemical Information and Computer Sciences, 41(1), 134–146. 10.1021/ci000069q.
  16. \pkgcluster: Cluster Analysis Basics and Extensions. \proglangR package version 2.1.2, URL https://CRAN.R-project.org/package=cluster.
  17. Morgan B, Ray A (1995). “Non-Uniqueness and Inversions in Cluster Analysis.” Applied Statistics, 44(1), 117–134. 10.2307/2986199.
  18. Paradis E, Schliep K (2019). “\pkgape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in \proglangR.” Bioinformatics, 35(3), 526–528. 10.1093/bioinformatics/bty633.
  19. \proglangR Core Team (2021). \proglangR: A Language and Environment for Statistical Computing. \proglangR Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  20. Rousseeuw P (1986). “A Visual Display for Hierarchical Classification.” In E Diday, Y Escoufier, L Lebart, J Pagés, Y Schektman, R Tomassone (eds.), Data Analysis and Informatics, IV, pp. 743–748. North-Holland, Amsterdam.
  21. \proglangSAS Institute Inc (2018). \proglangSAS/STAT 15.1 User’s Guide. \proglangSAS Institute Inc., Cary, NC, USA. URL http://documentation.sas.com/api/collections/pgmsascdc/9.4_3.4/docsets/statug/content/statug.pdf.
  22. “Nonunique UPGMA clusterings of microsatellite markers.” Briefings in Bioinformatics, 23(5), bbac312. 10.1093/bib/bbac312.
  23. Sneath P, Sokal R (1973). Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman and Company.
  24. Sokal R, Rohlf F (1962). “The Comparison of Dendrograms by Objective Methods.” Taxon, 11(2), 33–40. 10.2307/1217208.
  25. StataCorp LLC (2021). \proglangStata 17. StataCorp LLC, College Station, TX, USA. URL http://www.stata.com.
  26. The MathWorks Inc (2022). \proglangMATLAB — Statistics and Machine Learning Toolbox (R2022a). The MathWorks Inc., Natick, MA, USA. URL http://www.mathworks.com/help/stats/linkage.html.
  27. “Instability of Hierarchical Cluster Analysis Due to Input Order of the Data: The \pkgPermuCLUSTER Solution.” Psychological Methods, 10(4), 468–476. 10.1037/1082-989X.10.4.468.
  28. “\pkgSciPy 1.0: Fundamental Algorithms for Scientific Computing in \proglangPython.” Nature Methods, 17, 261–272. 10.1038/s41592-019-0686-2.
  29. “Multivariate Methods in Plant Ecology: V. Similarity Analyses and Information-Analysis.” Journal of Ecology, 54(2), 427–445. 10.2307/2257960.
  30. Wolfram Language & System Documentation Center (2020). \proglangMathematica 12.1 — Hierarchical Clustering Package Tutorial. Wolfram Research Inc., Champaign, IL, USA. URL https://reference.wolfram.com/language/HierarchicalClustering/tutorial/HierarchicalClustering.html.

Summary

We haven't generated a summary for this paper yet.