YANA Deploy: Neuromorphic FPGA Framework

Updated 24 May 2026

YANA Deploy is an integrated framework that deploys spiking neural networks on FPGA-based systems, specifically targeting the AMD Kria KR260 Robotics Starter Kit.
It offers an end-to-end workflow from training and model conversion to bitstream synthesis and real-time inference, with demonstrated performance on benchmarks like SHD.
The deployment pipeline leverages temporal and spatial sparsity optimizations and established toolchains such as Vivado and PYNQ to facilitate reproducible neuromorphic experiments.

YANA Deploy refers to the integrated hardware and software framework for deploying Spiking Neural Networks (SNNs) on the FPGA-based Yet Another Neuromorphic Accelerator (YANA), as described in "YANA: Bridging the Neuromorphic Simulation-to-Hardware Gap" (Pachideh et al., 3 Apr 2026). Designed specifically to address the simulation-to-hardware gap in neuromorphic computing, YANA Deploy enables end-to-end workflows from training and model conversion to real-time inference and performance optimization—primarily targeting the AMD Kria KR260 Robotics Starter Kit. The deployment pipeline supports direct mapping of arbitrary SNN topologies through point-to-point connectivity, and comprehensive exploitation of both temporal and spatial sparsity in event-driven computation, with empirical latency scaling demonstrated on benchmarks such as the Spiking Heidelberg Digits (SHD) dataset.

1. Target Platform and Hardware Setup

YANA Deploy is optimized for the AMD Kria KR260 Robotics Starter Kit, which integrates the Zynq UltraScale+ MPSoC (Processing System + Programmable Logic, PS + PL). Hardware preparation mandates connecting the KR260 for programming (via micro-USB to the USB-JTAG port) and console access (USB-UART or USB-C to UART), then supplying 12V DC. Upon powering up, board status can be confirmed via a serial console at 115200 baud, exposing the U-Boot prompt.

The YANA core’s resource utilization is characterized as follows:

Deployment Form	LUTs	Registers	BRAM	URAM	DSP	$N_\text{max}$	$S_\text{max}$
YANA core (single)	740	918	7	24	2	$2^{10}$	$2^{17}$
Full deployment	1687	1817	13.5	48	4	$2^{10}$	$2\times 2^{17}$

$I/O$ is realized through AXI4-Lite (for control, via PYNQ), AXI4-Stream (input spike event buffer, output result buffer), and optional event I/O via FMC-LPC connectors. The default single-core configuration accommodates 1024 neurons ( $N_\text{max}=2^{10}$ ) and 131,072 synapses ( $S_\text{max}=2^{17}$ ) (Pachideh et al., 3 Apr 2026).

2. Software Environment and Toolchain

Deployment requires a dual-component toolchain on the host:

FPGA development: Vivado 2023.1+ for programmable logic synthesis; Vitis 2023.1+ for PS application builds
System software and model integration: The PYNQ-Kria image boots Linux on the PS. Python 3.8+ with packages such as pynq, torch, norse, tonic, pytorch-lightning, and nir (Neuromorphic Intermediate Representation) are requisite.

Repository access is achieved via:

$2^{10}$ 2

Vivado environment variables are set via source /opt/Xilinx/Vivado/2023.1/settings64.sh; Python environments are established using venv and pip install -r requirements.txt. Bitstream build occurs in fpga/vivado using TCL scripts (e.g., vivado -mode batch -source build_yana.tcl). Python runtime interfaces for YANA are installed via:

$2^{10}$ 3

3. SNN Model Preparation and Conversion

SNNs are developed and trained via Norse and PyTorch Lightning. Export to the YANA-compatible intermediate format employs the Neuromorphic Intermediate Representation (NIR):

$2^{10}$ 4

Conversion of exported models to YANA deployment configuration utilizes the toolchain:

$2^{10}$ 5

YANA utilizes a forward-Euler scheme with leak integration performed via a precomputed lookup table (LUT):

$\tilde{u}(t+n) = u(t)\cdot(1 - 1/\tau_\text{mem})^n + \tau_\text{mem}^{-1} I(t)$

where the term $S_\text{max}$ 0 is tabulated for $S_\text{max}$ 1; for $S_\text{max}$ 2 the value is rounded to zero. Default configuration parameters $S_\text{max}$ 3 and $S_\text{max}$ 4 are specified in config/params.yaml.

4. Bitstream Generation and Hardware Implementation

Vivado project scripts (e.g., build_yana.tcl) handle project creation, block design instantiation (including the custom YANA IP core), clock and DDR configuration, and AXI stream/Lite interfacing. Sample TCL steps:

$2^{10}$ 6

Constraints for PS-PL clocks, AXI4-Lite, and AXI4-Stream are defined in an .xdc file. Hardware is exported for Vitis integration via write_hw_platform. Resulting .bit and .hdf files are transferred to the boot partition, enabling subsequent accelerator initialization.

5. Runtime Deployment and Inference

After booting the Kria KR260 with the configured PYNQ image and hardware overlays, the PS loads the bitstream and initializes the core as follows:

$2^{10}$ 7

For benchmarking, input events (such as those from the Spiking Heidelberg Digits dataset) are formatted as 64-bit packets:

$2^{10}$ 8

These are streamed into the accelerator via AXI-DMA. Event processing is initiated and timed using:

$2^{10}$ 9

Performance metrics can be accessed via the system shell or the Python API, providing detailed statistics (cycle count, event throughput, latency in ms).

6. Performance Tuning and Sparsity Optimization

Fine-tuning of YANA parameters is performed in config/params.yaml, including the membrane time constant (tau_mem), LUT depth (n_max), max synapses per core (s_max), and buffer depths.

Temporal sparsity is exploited by probabilistic event dropping: if $S_\text{max}$ 5 (where $S_\text{max}$ 6 is event dropout probability), latency scales as

$S_\text{max}$ 7

Spatial sparsity is leveraged via synaptic pruning (zeroing weak weights):

$S_\text{max}$ 8

$S_\text{max}$ 9

Empirically, joint scaling on SHD demonstrates near-linearity for $2^{10}$ 0:

$2^{10}$ 1

Performance curves may be visualized using matplotlib. This suggests systematic cost advantages for highly sparse, temporally structured SNN workloads.

7. Workflow Summary and Significance

The YANA Deploy pipeline encompasses an accessible, reproducible workflow from model conception (PyTorch/Norse) through NIR-based configuration, bitstream synthesis, and execution on the Kria KR260 platform (Pachideh et al., 3 Apr 2026). Its event-driven pipeline achieves robust one-event-per-cycle throughput, mitigates buffer overflow via balanced input preprocessing, and decouples SNN topology constraints. Integration with the Neuromorphic Intermediate Representation (NIR) facilitates toolchain interoperability and extends the ecosystem for neuromorphic research on general-purpose FPGA hardware.

A plausible implication is that YANA Deploy substantially reduces the simulation-to-hardware barrier, enabling algorithmic innovation, co-design, and empirical benchmarks in neuromorphic computing on non-proprietary accelerator platforms.

Markdown Report Issue Upgrade to Chat

References (1)

YANA: Bridging the Neuromorphic Simulation-to-Hardware Gap (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to YANA Deploy.