NeCTAr: A Heterogeneous RISC-V SoC for Language Model Inference in Intel 16

Published 18 Mar 2025 in cs.AR | (2503.14708v1)

Abstract: This paper introduces NeCTAr (Near-Cache Transformer Accelerator), a 16nm heterogeneous multicore RISC-V SoC for sparse and dense machine learning kernels with both near-core and near-memory accelerators. A prototype chip runs at 400MHz at 0.85V and performs matrix-vector multiplications with 109 GOPs/W. The effectiveness of the design is demonstrated by running inference on a sparse LLM, ReLU-Llama.