Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comprehensive Resource Measurement and Analysis for HPC Systems with TACC_Stats (1302.4085v1)

Published 17 Feb 2013 in cs.DC and cs.PF

Abstract: High-performance computing (HPC) systems are a complex combination of software, processors, memory, networks, and storage systems characterized by frequent disruptive technological advances. Anomalous behavior has to be manually diagnosed and remedied with incomplete and sparse data. It also has been effort-intensive for users to assess the effectiveness with which they are using the available resources. The data available for system level analyses appear from multiple sources and in disparate formats (from Linux "sysstat" and accounting to scheduler/kernel logs). Sysstat does not resolve its measurements by job so that job-oriented analyses require individual measurements. There are many user-oriented performance instrumentation and profiling tools but they require extensive system knowledge, code changes and recompilation, and thus are not widely used. To address this issue, we develop TACC_Stats, a job-oriented and logically structured version of the conventional Linux "sysstat/sar" system-wide performance monitor. We use TACC_Stats-collected data from a supercomputer "Ranger" to demonstrate its effectiveness in two case studies.

Summary

We haven't generated a summary for this paper yet.