Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors

Published 6 Jun 2022 in cs.AR | (2206.02874v3)

Abstract: Tensor Cores have been an important unit to accelerate Fused Matrix Multiplication Accumulation (MMA) in all NVIDIA GPUs since Volta Architecture. To program Tensor Cores, users have to use either legacy wmma APIs or current mma APIs. Legacy wmma APIs are more easy-to-use but can only exploit limited features and power of Tensor Cores. Specifically, wmma APIs support fewer operand shapes and can not leverage the new sparse matrix multiplication feature of the newest Ampere Tensor Cores. However, the performance of current programming interface has not been well explored. Furthermore, the computation numeric behaviors of low-precision floating points (TF32, BF16, and FP16) supported by the newest Ampere Tensor Cores are also mysterious. In this paper, we explore the throughput and latency of current programming APIs. We also intuitively study the numeric behaviors of Tensor Cores MMA and profile the intermediate operations including multiplication, addition of inner product, and accumulation. All codes used in this work can be found in https://github.com/sunlex0717/DissectingTensorCores.

Citations (36)

Summary

  • The paper dissects Tensor Cores by using microbenchmarks to measure latency, throughput, and numeric precision.
  • It details a rigorous methodology that quantifies hardware performance metrics critical for deep learning and high-performance computing.
  • Findings offer actionable insights for optimizing AI computations and refining system designs to fully harness Tensor Core capabilities.

Insights into the IEEEtran.cls for IEEE Computer Society Journals

The paper "Bare Advanced Demo of IEEEtran.cls for IEEE Computer Society Journals" by Michael Shell, John Doe, and Jane Doe, serves as a practical guide for authors preparing manuscripts for submission to IEEE Computer Society journals. This document demonstrates the utilization of the IEEEtran.cls, a \LaTeX\ class that streamlines the typesetting process, ensuring compliance with the IEEE's formatting standards.

Purpose and Scope

At its core, the document functions as a template file, providing authors with a structured starting point when preparing their papers. This is particularly significant as it alleviates common typesetting challenges by embedding IEEE's specific formatting rules into a reusable \LaTeX\ class. The paper does not present empirical research; instead, it offers a pragmatic approach to using IEEEtran.cls version 1.8b, highlighting its features through the inclusion of various sections typical of IEEE papers.

Key Features

The paper focuses on several foundational aspects of the IEEEtran.cls:

  • Document Template: The document exemplifies the standard layout expected for IEEE Computer Society journals, including title, author affiliations, abstract, keywords, introduction, body content with various sections, conclusions, references, acknowledgments, and author biographies.
  • Automation of Formatting: The IEEEtran.cls automates several intricate formatting requirements, such as font size adjustments, hyphenation settings, and specific section headings, which are integral for maintaining consistency across submissions to IEEE journals.
  • Versatility: While this paper provides a static template, the IEEEtran.cls is versatile, allowing for customization to accommodate the varying structures of different types of journal articles, such as research papers, review articles, or commentaries.

Practical Implications

For authors, especially those unfamiliar with IEEE's formatting conventions or \LaTeX\ typesetting, the availability of a comprehensive template like the IEEEtran.cls significantly reduces the time and effort required to format a manuscript. This facilitates a more focused effort on the research content rather than the formatting, potentially accelerating the peer review process.

Theoretical Implications and Future Directions

From a broader perspective, the IEEEtran.cls represents an evolution in academic publishing tools, where standardization is met with the flexibility necessary to adapt to unique document requirements. As \LaTeX\ and typesetting standards evolve, future developments may include enhanced package compatibility, integration with automated submission systems, or tools leveraging AI for further automation in formatting.

In conclusion, while the paper does not yield traditional research findings, its contribution to the academic community lies in its facilitation of efficient manuscript preparation. The IEEEtran.cls is indicative of ongoing efforts to standardize and simplify the complex requirements of academic publishing, ultimately supporting researchers in focusing on their core work of advancing knowledge within their fields.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.