Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator (2309.11488v1)

Published 20 Sep 2023 in cs.DC and cs.AR

Abstract: Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternatively, GPUs are also a good candidate to accelerate the computation due to their massively parallel architecture that allows many floating point operations per second to be performed. The numerical iterative solver takes thus the most computational time and is challenging to solve efficiently due to the dependencies that exist in the model between cells. In this work, we evaluate the OPM Flow simulator and compare several state-of-the-art GPU solver libraries as well as custom developed solutions for a BiCGStab solver using an ILU0 preconditioner and benchmark their performance against the default DUNE library implementation running on multiple CPU processors using MPI. The evaluated GPU software libraries include a manual linear solver in OpenCL and the integration of several third party sparse linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform our bench-marking, we use small, medium, and large use cases, starting with the public test case NORNE that includes approximately 50k active cells and ending with a large model that includes approximately 1 million active cells. We find that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times, and that it can compare with around 8 dual-threaded MPI processes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. [Online]. Available: https://opm-project.org
  2. M. Blatt, A. Burchardt, A. Dedner, C. Engwer, J. Fahlke, B. Flemisch, C. Gersbacher, C. Gräser, F. Gruber, C. Grüninger, D. Kempf, R. Klöfkorn, T. Malkmus, S. Müthing, M. Nolte, M. Piatkowski, and O. Sander, “The Distributed and Unified Numerics Environment, Version 2.4,” Archive of Numerical Software, vol. 4, no. 100, pp. 13–29, 2016. [Online]. Available: http://dx.doi.org/10.11588/ans.2016.100.26526
  3. Y. Saad and H. A. van der Vorst, “Iterative solution of linear systems in the 20th century,” Journal of Computational and Applied Mathematics, vol. 123, no. 1, pp. 1–33, 2000, numerical Analysis 2000. Vol. III: Linear Algebra. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S037704270000412X
  4. K. Wang, H. Liu, J. Luo, and Z. Chen, “Efficient cpr-type preconditioner and its adaptive strategies for large-scale parallel reservoir simulations,” Journal of Computational and Applied Mathematics, vol. 328, pp. 443–468, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0377042717303734
  5. J. Michalakes and M. Vachharajani, “Gpu acceleration of numerical weather prediction,” Parallel Processing Letters, vol. 18, no. 04, pp. 531–548, 2008.
  6. S. Georgescu, P. Chow, and H. Okuda, “Gpu acceleration for fem-based structural analysis,” Archives of Computational Methods in Engineering, vol. 20, no. 2, pp. 111–121, 2013.
  7. L. Murray, “Gpu acceleration of runge-kutta integrators,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 1, pp. 94–101, 2012.
  8. Y. Lin, “Gpu acceleration in vlsi back-end design: Overview and case studies,” in 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2020, pp. 1–4.
  9. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
  10. A. F. Rasmussen, T. H. Sandve, K. Bao, A. Lauser, J. Hove, B. Skaflestad, R. Klöfkorn, M. Blatt, A. B. Rustad, O. Sævareid, K.-A. Lie, and A. Thune, “The open porous media flow reservoir simulator,” 2019. [Online]. Available: https://arxiv.org/abs/1910.06059
  11. [Online]. Available: https://petrowiki.spe.org/Reservoir_simulation
  12. J. A. Trangenstein and J. B. Bell, “Mathematical structure of the black-oil model for petroleum reservoir simulation,” SIAM Journal on Applied Mathematics, vol. 49, pp. 749–783, 1989. [Online]. Available: http://doi.org/10.2307/2101984
  13. [Online]. Available: https://www.software.slb.com/products/eclipse
  14. [Online]. Available: https://resinsight.org/
  15. M. Muskat and M. W. Meres, “The flow of heterogeneous fluids through porous media,” Physics, vol. 7, no. 9, pp. 346–363, 1936.
  16. M. Tek, “Development of a generalized darcy equation,” Journal of Petroleum Technology, vol. 9, no. 06, pp. 45–47, 1957.
  17. [Online]. Available: https://opm-project.org/?page_id=955
  18. M. Blatt and P. Bastian, “The iterative solver template library,” in Applied Parallel Computing – State of the Art in Scientific Computing, B. Kagström, E. Elmroth, J. Dongarra, and J. Wasniewski, Eds.   Berlin/Heidelberg: Springer, 2007, pp. 666–675.
  19. J. R. Fanchi, K. J. Harpole, and S. W. Bujnowski, “Boast: a three-dimensional, three-phase black oil applied simulation tool (version 1. 1). volume i. technical description and fortran code,” 9 1982. [Online]. Available: https://www.osti.gov/biblio/7069892
  20. [Online]. Available: https://netl.doe.gov/node/7530
  21. [Online]. Available: https://www.software.slb.com/products/intersect
  22. [Online]. Available: http://stoneridgetechnology.com/echelon/
  23. [Online]. Available: https://www.aramco.com/en/creating-value/technology-development/in-house-developed-technologies/terapowers
  24. [Online]. Available: https://www.hpc.kaust.edu.sa/content/shaheen-ii
  25. [Online]. Available: http://www.geosx.org/
  26. [Online]. Available: http://tnavigator.com/
  27. P. C. Lichtner, G. E. Hammond, C. Lu, S. Karra, G. Bisht, B. Andre, R. T. Mills, J. Kumar, and J. M. Frederick, “PFLOTRAN Web page,” 2020, http://www.pflotran.org.
  28. [Online]. Available: https://docs.nvidia.com/cuda/cusparse/index.html
  29. [Online]. Available: https://icl.utk.edu/magma/index.html
  30. H. Anzt, W. Sawyer, S. Tomov, P. Luszczek, I. Yamazaki, and J. Dongarra, “Optimizing krylov subspace solvers on graphics processing units,” in Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, IEEE.   Phoenix, AZ: IEEE, 05-2014 2014.
  31. [Online]. Available: http://viennacl.sourceforge.net/
  32. H. Anzt, T. Cojean, G. Flegar, F. Göbel, T. Grützmacher, P. Nayak, T. Ribizel, Y. M. Tsai, and E. S. Quintana-Ortí, “Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,” ACM Transactions on Mathematical Software, vol. 48, no. 1, pp. 2:1–2:33, Feb. 2022. [Online]. Available: https://doi.org/10.1145/3480935
  33. D. Demidov, “Amgcl: An efficient, flexible, and extensible algebraic multigrid implementation,” Lobachevskii Journal of Mathematics, vol. 40, no. 5, pp. 535–546, May 2019. [Online]. Available: https://doi.org/10.1134/S1995080219050056
  34. [Online]. Available: https://github.com/ROCmSoftwarePlatform/rocALUTION
  35. [Online]. Available: https://github.com/ROCmSoftwarePlatform/rocSPARSE
  36. [Online]. Available: https://github.com/ROCm-Developer-Tools/HIP
  37. [Online]. Available: https://github.com/RadeonOpenCompute/ROCm
  38. E. Chow and A. Patel, “Fine-grained parallel incomplete lu factorization,” SIAM journal on Scientific Computing, vol. 37, no. 2, pp. C169–C193, 2015.
  39. H. Anzt, E. Chow, and J. Dongarra, “Iterative sparse triangular solves for preconditioning,” in Euro-Par 2015: Parallel Processing, J. L. Träff, S. Hunold, and F. Versaci, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, pp. 650–661.
  40. R. Eberhardt and M. Hoemmen, “Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures,” in 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).   IEEE, 2016, pp. 663–672.
  41. A. Thune, X. Cai, and A. B. Rustad, “On the impact of heterogeneity-aware mesh partitioning and non-contributing computation removal on parallel reservoir simulations,” Journal of Mathematics in Industry, vol. 11, no. 1, jun 2021. [Online]. Available: https://doi.org/10.1186/s13362-021-00108-5
  42. [Online]. Available: https://github.com/andrthu
  43. T. A. Davis, “Algorithm 832: Umfpack v4.3—an unsymmetric-pattern multifrontal method,” ACM Trans. Math. Softw., vol. 30, no. 2, p. 196–199, jun 2004. [Online]. Available: https://doi.org/10.1145/992200.992206
  44. D. Demidov, “Amgcl – a c++ library for efficient solution of large sparse linear systems,” Software Impacts, vol. 6, p. 100037, 2020. [Online]. Available: https://doi.org/10.1016/j.simpa.2020.100037
  45. [Online]. Available: https://github.com/ddemidov/vexcl
  46. [Online]. Available: http://eigen.tuxfamily.org/
  47. [Online]. Available: https://bitbucket.org/blaze-lib/blaze
  48. [Online]. Available: https://docs.nvidia.com/cuda/cublas/index.html
  49. [Online]. Available: https://opm-project.org/?page_id=559
  50. [Online]. Available: https://www.simula.no/
  51. [Online]. Available: https://www.ex3.simula.no/
  52. [Online]. Available: https://www.simula.no/news/improving-norwegian-research-infrastructure-experimental-infrastructure-exploration-exascale
  53. [Online]. Available: http://wiki.ex3.simula.no/doku.php
  54. T. Hogervorst, R. Nane, G. Marchiori, T. D. Qiu, M. Blatt, and A. B. Rustad, “Hardware acceleration of high-performance computational flow dynamics using high-bandwidth memory-enabled field-programmable gate arrays,” ACM Trans. Reconfigurable Technol. Syst., vol. 15, no. 2, dec 2021. [Online]. Available: https://doi.org/10.1145/3476229
  55. OPM, “Open porous media project,” 2020. [Online]. Available: https://github.com/OPM
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tong Dong Qiu (2 papers)
  2. Andreas Thune (2 papers)
  3. Markus Blatt (5 papers)
  4. Alf Birger Rustad (3 papers)
  5. Razvan Nane (3 papers)

Summary

We haven't generated a summary for this paper yet.