Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II -- Clustering Extremely High-Dimensional Grid-Based Data (2307.01400v1)

Published 3 Jul 2023 in cs.LG, cs.NA, and math.NA

Abstract: Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spatial domain is represented by a large number of grid points, numbering in the millions, the clustering of the data becomes more challenging. In this report, we consider output data from simulations of a jet interacting with high explosives. These data are available on spatial domains of different sizes, at grid points that vary in their spatial coordinates, and in a format that distributes the output across multiple files at each time step of the simulation. We first describe how we bring these data into a consistent format prior to clustering. Borrowing the idea of random projections from data mining, we reduce the dimension of our data by a factor of thousand, making it possible to use the iterative k-means method for clustering. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set. Our approach makes clustering of extremely high dimensional data tractable, generating meaningful cluster assignments for our problem, despite the approximation introduced in the random projections.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. D. Achlioptas. Database-friendly random projections. In PODS ’01: Proceedings of the twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 274–281. Association for Computing Machinery, 2001. doi: https://doi.org/10.1145/375551.375608.
  2. D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671–687, 2003. doi: https://doi.org/10.1016/S0022-0000(03)00025-4. Special Issue on PODS 2001.
  3. Nonlinear model order reduction based on local reduced-order bases. International Journal for Numerical Methods in Engineering, 92(10):891–916, 2012. doi: https://doi.org/10.1002/nme.4371.
  4. High-Dimensional Clustering via Random Projections. Journal of Classification, 39(1):191–216, March 2022. doi: https://doi.org/10.1007/s00357-021-09403-.
  5. E. Bingham and H. Mannila. Random projection in dimensionality reduction: Applications to image and text data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, page 245 250, New York, NY, USA, 2001. Association for Computing Machinery. doi: https://doi.org/10.1145/502512.502546.
  6. Model order reduction assisted by deep neural networks (ROM-net). Advanced Modeling and Simulation in Engineering Sciences, 7, 2020. doi: https://doi.org/10.1186/s40323-020-00153-6.
  7. S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 22(1):60–65, 2002. doi: https://doi.org/10.1002/rsa.10073.
  8. Q. Du and M. Gunzburger. Model reduction by proper orthogonal decomposition coupled with centroidal Voronoi tessellations (Keynote). In Proceedings, Fluids Engineering Division Summer Meeting, volume Volume 1: Fora, Parts A and B, pages 1401–1406, 07 2002. doi: https://doi.org/10.1115/FEDSM2002-31051.
  9. X. Fern and C. Brodley. Cluster ensembles for high dimensional clustering: an empirical study. J Mach Learn Res, 22, 01 2004.
  10. Random projection for high dimensional data clustering: A cluster ensemble approach. In International Conference on Machine Learning, pages 186–193, 2003.
  11. Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2007. doi: https://doi.org/10.1137/1.9780898718348.
  12. A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8):651–666, 2010. doi: https://doi.org/10.1016/j.patrec.2009.09.011. Award winning papers from the 19th International Conference on Pattern Recognition (ICPR).
  13. Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988.
  14. I. T. Jolliffe and J. Cadima. Principal component analysis: A review and recent developments. Phil. Trans. R. Soc. A., 374, 2016. doi: http://doi.org/10.1098/rsta.2015.0202.
  15. C. Kamath. Intelligent sampling for surrogate modeling, hyperparameter optimization, and data analysis. Machine Learning with Applications, 9:100373, 2022. doi: https://doi.org/10.1016/j.mlwa.2022.100373.
  16. Spatio-temporal surrogates for interaction of a jet with high explosives: Part I - Analysis with a small sample size. Technical Report LLNL-TR-850152, Lawrence Livermore National Laboratory CA., June 2023.
  17. N. Kambhatla and T. K. Leen. Dimension reduction by local principal component analysis. Neural Computation, 9:1493–1516, 1997.
  18. S. Kaski. Dimensionality reduction by random mapping: Fast similarity computation for clustering. In Proceedings IEEE International Conference on Neural Networks, volume 1, pages 413 – 418 vol.1, 06 1998. ISBN 0-7803-4859-1. doi: https://doi.org/10.1109/IJCNN.1998.682302.
  19. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 287–296, New York, NY, USA, 2006. Association for Computing Machinery. doi: https://doi.org/10.1145/1150402.1150436.
  20. Determining the number of clusters via iterative consensus clustering. In Proceedings of the 2013 SIAM International Conference on Data Mining (SDM), pages 94–102, 2013. doi: https://doi.org/10.1137/1.9781611972832.11.
  21. D. P. Mitchell. Spectrally optimal sampling for distribution ray tracing. Computer Graphics, 25(4):157–164, 1991.
  22. P. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65, 11 1987. doi: https://doi.org/10.1016/0377-0427(87)90125-7.
  23. A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583–617, 01 2002. doi: https://doi.org/10.1162/153244303321897735.
  24. The HDF Group. Hierarchical Data Format, version 5, 1997-2023. https://www.hdfgroup.org/HDF5/.
  25. M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86, 01 1991. doi: https://doi.org/10.1162/jocn.1991.3.1.71.
  26. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: https://doi.org/10.1038/s41592-019-0686-2.
  27. Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236–244, 1963. doi: https://doi.org/10.1080/01621459.1963.10500845.
Citations (1)

Summary

We haven't generated a summary for this paper yet.