Auto-SNL: FPGA Deployment for Neural Networks

Updated 1 September 2025

Auto-SNL is a Python-based toolchain that automates the conversion of trained neural network models into FPGA-compatible HLS code within the SNL ecosystem.
It leverages dynamic weight updates via AXI-Lite registers to enable adaptive, low-latency real-time inference for high-throughput applications.
Benchmark results show that Auto-SNL achieves competitive latency and resource optimization compared to traditional tools like hls4ml.

Auto-SNL refers to a Python-based toolchain and methodology for the rapid deployment of machine learning neural network (NN) models on Field-Programmable Gate Arrays (FPGAs) within the SLAC Neural Network Library (SNL) ecosystem. Auto-SNL automates the conversion of trained NNs defined in Python—using frameworks such as Keras or TensorFlow—into high-level synthesis (HLS) code suitable for FPGA deployment, with special attention to low-latency, resource-constrained real-time inference environments relevant for large-scale scientific experiments and other high-throughput domains.

1. Motivation and System Context

Auto-SNL has been developed in response to the computational and latency bottlenecks encountered in real-time data reduction at facilities such as the LCLS-II Free Electron Laser (FEL), where detector data rates may exceed 1 TB/s with experimental repetition rates up to 1 MHz (Rahali et al., 29 Aug 2025). The transmission and storage infrastructure required for such datasets is cost-prohibitive; hence, edge computation and data reduction are imperative. FPGAs are widely used for such tasks, but conventional approaches to deploying ML models on FPGAs are hindered by the need for expert knowledge of hardware description languages, extended design cycles, and rigid model instantiation.

The SLAC Neural Network Library (SNL) introduces support for dynamically updating NN weights without resynthesizing FPGA logic, facilitating adaptive and continually learning applications. Auto-SNL complements this by providing a Python-based, user-friendly interface to define, configure, and deploy trained NN models directly to SNL-supported FPGA targets.

2. Technical Workflow and Architecture

The operational pipeline of Auto-SNL can be summarized as follows:

Model Input and Configuration: Users provide a trained neural network model (e.g., from Keras) and specify FPGA synthesis parameters, including data precision (fixed-point modes), clock period, and the target device (e.g., Xilinx ZCU102).
Automated Conversion: Auto-SNL parses the NN architecture and generates the corresponding HLS code, mapping layers, weights, and biases to SNL’s hardware runtime model. Precision is controlled via a tuple notation ⟨X, Y⟩, where X is the total bit-width and Y is the number of bits above the binary point.
Integration with SNL: The converted model is embedded within SNL’s infrastructure, which handles weight and bias management via AXI-Lite interfaces—enabling run-time updates without FPGA resynthesis.
Deployment and Inference: The generated project files are synthesized and deployed to the FPGA using standard hardware EDA toolchains. The SNL runtime decouples weight updates from bitstream synthesis, allowing for fast iteration.

A prototypical usage pattern is:

import autosnl

model = load_model('my_model.h5')
params = {
    'precision': (32, 16),
    'clock_period': '10 ns',
    'target_device': 'Xilinx ZCU102'
}
autosnl.convert(model, params)

This pipeline eliminates the need for explicit HDL or HLS code development by the user and abstracts FPGA configuration particulars behind a high-level Pythonic interface (Rahali et al., 29 Aug 2025).

3. Benchmarking and Performance Characteristics

The comparative benchmark in (Rahali et al., 29 Aug 2025) evaluates Auto-SNL (and its underlying SNL framework) against hls4ml, a widely adopted toolchain for CNN/DNN-to-FPGA synthesis within the HEP and ML physics communities. The evaluation considers a range of NNs:

Particle jet classifiers (physics)
Fully connected autoencoders (anomaly detection)
Convolutional architectures for keyword spotting (KWS)
Binary classifiers for vision tasks (VWW)

Models are synthesized using fixed-point representations ⟨32,16⟩, ⟨16,6⟩, ⟨8,3⟩. In hls4ml, reuse factors and optimization strategies (latency/resource) are also varied.

Key empirical findings:

SNL achieves equal or lower inference latency in three out of four tested NN architectures, especially in convolutional networks, under high reuse factor configurations.
SNL sometimes increases BRAM or FF resource usage, but can reduce DSP and LUT count for certain NN types versus hls4ml. This suggests a nuanced trade-off that can be tuned based on application constraints.
The ability to update weights at runtime, unique to SNL, provides extra flexibility for adaptive or streaming scenarios, which is not generally supported in static hls4ml flows.

Model Type	Latency (SNL vs. hls4ml)	Notable Resource Comparison
Jet classifier	Superior or competitive	Lower DSP/LUT in SNL (some cases)
Autoencoder	Comparable	SNL advantageous at high precision
ConvNet (KWS/VWW)	Lower in SNL	SNL uses more BRAM, less DSP/LUT

These results indicate that for many real-time experimental and scientific computation settings, Auto-SNL provides robust, low-latency NN inference on FPGA hardware with configuration flexibility (Rahali et al., 29 Aug 2025).

4. Resource Management and Adaptivity

A central technological feature of SNL, leveraged by Auto-SNL, is weight and bias management via AXI-Lite registers. This permits in-application updates without reinitiating complete FPGA bitstream synthesis. This property is critical for:

Adaptive experiments: Parameters updated as new data arrives.
Rapid ML iteration: Deployment cycles measured in seconds/minutes, not hours/days.
Edge and streaming applications: Where model adaptation to data drift or new regimes is required.

Auto-SNL’s approach abstracts this capability, providing users with direct access to dynamic model updates in Python-based workflows.

5. Application Domains and Use Cases

Auto-SNL is designed for domains where tight latency constraints and resource efficiency are paramount:

High-energy physics: Real-time data reduction at beamlines with high-throughput detectors, e.g., FEL facilities (Rahali et al., 29 Aug 2025).
Medical imaging: Low-latency inference in embedded diagnostics.
Robotics: On-board decision making requiring rapid responses with minimal resource overhead.

The dynamic update feature broadens applicability to time-varying, feedback-driven systems.

6. Future Developments

Planned extensions for Auto-SNL and SNL include:

Support for a wider range of NN architectures and data types beyond current convolutional and fully-connected models.
Expansion to additional FPGA hardware targets beyond Xilinx ZCU102.
Enhancements to the configuration interface, including increased exposure of synthesis parameters and possible integration of a graphical user interface (GUI).
Further optimization for resource use as well as exploration of synthesis strategies that optimize for energy, throughput, or memory footprint.

A plausible implication is that the convergence of automated Python model deployment, dynamic model reconfiguration, and hardware-adaptive synthesis will continue to make SNL and Auto-SNL prominent in real-time embedded ML applications (Rahali et al., 29 Aug 2025).

PDF Markdown Chat (Upgrade)

References (1)

Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL (2025)