Automated Calibration of Parallel and Distributed Computing Simulators: A Case Study (2403.13918v2)
Abstract: Many parallel and distributed computing research results are obtained in simulation, using simulators that mimic real-world executions on some target system. Each such simulator is configured by picking values for parameters that define the behavior of the underlying simulation models it implements. The main concern for a simulator is accuracy: simulated behaviors should be as close as possible to those observed in the real-world target system. This requires that values for each of the simulator's parameters be carefully picked, or "calibrated," based on ground-truth real-world executions. Examining the current state of the art shows that simulator calibration, at least in the field of parallel and distributed computing, is often undocumented (and thus perhaps often not performed) and, when documented, is described as a labor-intensive, manual process. In this work we evaluate the benefit of automating simulation calibration using simple algorithms. Specifically, we use a real-world case study from the field of High Energy Physics and compare automated calibration to calibration performed by a domain scientist. Our main finding is that automated calibration is on par with or significantly outperforms the calibration performed by the domain scientist. Furthermore, automated calibration makes it straightforward to operate desirable trade-offs between simulation accuracy and simulation speed.
- R. Buyya and M. Murshed, “GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource Management and Scheduling for Grid Computing,” Concurrency and Computation: Practice and Experience, vol. 14, no. 13-15, pp. 1175–1220, Dec. 2002.
- R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R. Buyya, “CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms,” Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, Jan. 2011.
- S. Ostermann, K. Plankensteiner, R. Prodan, and T. Fahringer, “GroudSim: An Event-Based Simulation Framework for Computational Grids and Clouds,” in Euro-Par 2010 Parallel Processing Workshops, M. R. Guarracino, F. Vivien, J. L. Träff, M. Cannatoro, M. Danelutto, A. Hast, F. Perla, A. Knüpfer, B. Di Martino, and M. Alexander, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 305–313.
- G. Kecskemeti, “DISSECT-CF: A Simulator to Foster Energy-Aware Scheduling in Infrastructure Clouds,” Simulation Modelling Practice and Theory, vol. 58, pp. 188–218, 2015.
- J. Cope, N. Liu, S. Lang, P. Carns, C. Carothers, and R. Ross, “CODES: Enabling Co-Design of Multilayer Exascale Storage Architectures,” in Proc. of the Workshop on Emerging Supercomputing Technologies, 2011.
- M.-Y. Hsieh, R. Riesen, K. Thompson, W. Song, and A. Rodrigues, “SST: A Scalable Parallel Framework for Architecture-Level Performance, Power, Area and Thermal Simulation,” The Computer Journal, vol. 55, no. 2, pp. 181–191, 2012.
- E. U. Yousuf Khan, T. Rahim Soomro, and M. Nawaz Brohi, “iFogSim: A Tool for Simulating Cloud and Fog Applications,” in Proceedings of the International Conference on Cyber Resilience, 2022, pp. 01–05.
- H. Casanova, R. Ferreira da Silva, R. Tanaka, S. Pandey, G. Jethwani, W. Koch, S. Albrecht, J. Oeth, and F. Suter, “Developing Accurate and Scalable Simulators of Production Workflow Management Systems with WRENCH,” Future Generation Computer Systems, vol. 112, pp. 162–175, 2020.
- H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, “Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms,” Journal of Parallel and Distributed Computing, vol. 75, no. 10, pp. 2899–2917, 2014.
- P. Velho, L. Mello Schnorr, H. Casanova, and A. Legrand, “On the Validity of Flow-level TCP Network Models for Grid and Cloud Simulations,” ACM Transactions on Modeling and Computer Simulation, vol. 23, no. 4, 2013.
- “The WRENCH Project,” http://wrench-project.org/, 2022.
- “The SimGrid Project,” http://simgrid.org/, 2022.
- P. Velho and A. Legrand, “Accuracy Study and Improvement of Network Simulation in the SimGrid Framework,” in Proc. of the 2nd Intl. Conf. on Simulation Tools and Techniques, 2009.
- K. Fujiwara and H. Casanova, “Speed and Accuracy of Network Simulation in the SimGrid Framework,” in Proc. of the 1st International Workshop on Network Simulation Tools, 2007.
- A. Lèbre, A. Legrand, F. Suter, and P. Veyre, “Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API,” in Proc. of the 8th IEEE International Symposium on Cluster Computing and the Grid, 2015.
- A. Degomme, A. Legrand, G. Markomanolis, M. Quinson, M. Stillwell, and F. Suter, “Simulating MPI applications: the SMPI approach,” IEEE Transactions on Parallel and Distributed Systems, vol. 18, no. 8, pp. 2387–2400, 2017.
- A. Rizvi, T. Toha, M. Lunar, M. Adnan, and A. Alim Al Islam, “Cooling Energy Integration in SimGrid,” in Proc. of the 2017 International Conference on Networking, Systems and Security (NSysS), 2017, pp. 132–137.
- F. C. Heinrich, T. Cornebize, A. Degomme, A. Legrand, A. Carpen-Amarie, S. Hunold, A. Orgerie, and M. Quinson, “Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node,” in Proc. of 2017 IEEE International Conference on Cluster Computing, 2017, pp. 92–102.
- L. Stanisic, E. Agullo, A. Buttari, A. Guermouche, A. Legrand, F. Lopez, and B. Videau, “Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers,” in Proc. of the 2015 IEEE 21st International Conference on Parallel and Distributed Systems, 2015, pp. 481–490.
- L. Stanisic, “A Reproducible Research Methodology for Designing and Conducting Faithful Simulations of Dynamic HPC Applications,” Ph.D. dissertation, Université Grenoble Alpes, France, 2015.
- T. Cornebize, A. Legrand, and F. C. Heinrich, “Fast and Faithful Performance Prediction of MPI Applications: the HPL Case Study,” in Proc. of the 2019 IEEE International Conference on Cluster Computing, 2019, pp. 1–11.
- T. Andel and A. Yasinsac, “On the Credibility of MANET Simulations,” Computer, vol. 39, no. 7, pp. 48–54, 2006.
- G. Flores, M. Paredes-Farrera, E. Jammeh, M. Fleury, and M. Reed, “OPNET Modeler and Ns-2: Comparing the Accuracy of Network Simulators for Packet-Level Analysis Using a Network Testbed,” WSEAS Transactions on Computers, vol. 2, no. 3, 2003.
- P. Garrido, M. Malumbres, and C. Calafate, “Ns-2 vs. OPNET: A Comparative Study of the IEEE 802.11e Technology on MANET Environments,” in Proc. of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems & Workshops, 2008.
- J. Lessmann, P. Janacik, L. Lachev, and D. Orfanus, “Comparative Study of Wireless Network Simulators,” in Proc of the 7th International Conference on Networking, 2008, pp. 517–523.
- P. Hurni and T. Braun, “Calibrating Wireless Sensor Network Simulation Models with Real-World Experiments,” in Proc. of the 8th International IFIP-TC 6 Networking Conference, ser. Lecture Notes in Computer Science, vol. 5550. Springer, 2009, pp. 1–13.
- M. Hofmann, “On the Complexity of Parameter Calibration in Simulation Models,” The Journal of Defense Modeling and Simulation, vol. 2, no. 4, pp. 217–226, 2005.
- Y. Liu, O. Batelaan, F. Smedt, J. Poorova, and L. Velcicka, “Automated Calibration Applied to a GIS-based Flood Simulation Model Using PEST,” Floods, from Defence to Management, pp. 317–326, 01 2005.
- T. Yang, Y. Pan, J. Mao, Y. Wang, and Z. Huang, “An Automated Optimization Method for Calibrating Building Energy Simulation Models with Measured Data: Orientation and a Case Study,” Applied Energy, vol. 179, 2016.
- J. Hourdakis, P. G. Michalopoulos, and J. Kottommannil, “Practical Procedure for Calibrating Microscopic Traffic Simulation Models,” Transportation research record, vol. 1852, no. 1, pp. 130–139, 2003.
- The CMS Collaboration, “The CMS experiment at the CERN LHC,” Journal of Instrumentation, vol. 3, no. 08, p. S08004, 2008.
- “The Worldwide LHC Computing Grid,” https://wlcg.web.cern.ch/, 2023.
- “The HTCondor Software Suite,” https://htcondor.org, 2023.
- “The XRootD Project,” https://xrootd.slac.stanford.edu/, 2023.
- “Simulator implementation,” https://zenodo.org/records/8300961, 2023.
- M. Horzela, “Measurement of Triple-Differential Z+Jet Cross-Sections with the CMS Detector at 13 TeV and Modelling of Large-Scale Distributed Computing Systems,” Ph.D. dissertation, Karlsruhe Institute of Technology, 2023. [Online]. Available: https://doi.org/10.5445/IR/1000165566
- Jesse McDonald (2 papers)
- Maximilian Horzela (4 papers)
- Frédéric Suter (32 papers)
- Henri Casanova (22 papers)