BrainScaleS-2: Accelerated Neuromorphic Platform
- BrainScaleS-2 is an accelerated mixed-signal neuromorphic platform that uses analog circuits and digital PPUs to emulate continuous-time spiking neural dynamics at approximately 1000× biological speed.
- It integrates over 500 analog neurons and 131K synapses on a 65 nm ASIC, supporting advanced plasticity, calibration, and scalable multi-chip interconnects for complex neural models.
- The system is designed for rapid neuroscience experimentation and efficient machine learning inference, offering real-time acceleration with low energy consumption.
The BrainScaleS-2 system is an accelerated mixed-signal neuromorphic computing platform that implements continuous-time analog circuits for neurons and synapses, tightly coupled to embedded digital processors with SIMD extensions. Developed to explore large-scale, energy-efficient implementations of spiking neural networks (SNNs) and in-memory artificial neural networks (ANNs), BrainScaleS-2 is architected to support both fast neuroscience experimentation and novel machine learning workloads. Its core design leverages a state-of-the-art 65 nm CMOS ASIC (the HICANN-X or "HX" chip), which physically embodies network dynamics at real-world acceleration factors of approximately 1 000× over biological timescales. The platform uniquely blends analog acceleration, customizable hybrid plasticity, and flexible digital event routing, providing scalable substrate for both single-chip and multi-chip systems.
1. Core Architecture and Physical Modeling
At the core of BrainScaleS-2 is a mixed-signal analog neural network implemented on a monolithic 65 nm CMOS ASIC. Each chip contains:
- Neuron circuits: 512 analog AdEx (Adaptive Exponential Integrate-and-Fire) neuron compartments, arranged in either two hemispheres of 256 or four quadrants of 128, depending on the ASIC revision. Each compartment flexibly realizes LIF (Leaky Integrate-and-Fire) and AdEx behaviors through configurable on-chip analog parameters stored on per-neuron capacitor arrays. Key AdEx model terms are implemented physically, including the exponential spike-initiation current and adaptation (Billaudelle et al., 2022, Pehle et al., 2022).
- Synapse arrays: 131 072 plastic synapses (256 per neuron or 128×256 per quadrant), each with local digital SRAM storing a 6-bit weight, 6-bit presynaptic address, and in-synapse analog circuitry for local spike-timing–dependent plasticity (STDP) measurements or short-term plasticity (STP) (Arnold et al., 2024, Grübl et al., 2020).
- Programmable analog core: Analog parameters for each neuron (e.g., leak conductance, adaptation strength, threshold, time constants) are supplied by integrated DACs and are tuneable for per-neuron calibration. Synapses implement both current-mode and conductance-mode postsynaptic currents.
- Embedded digital processors: Two 32-bit RISC PPUs per chip (PowerISA v2.06 subset) equipped with wide SIMD vector units. These processors handle weight updates, experiment control, environment simulation, and plasticity algorithms, enabling true hybrid analog-digital operation (Schemmel et al., 2020, Spilger et al., 2022).
The physical modeling approach (as opposed to time-stepped simulation) enables continuous-time, highly accelerated network evolution and direct emulation of membrane and synaptic dynamics, including support for compartmental neuron models and analog STDP correlators (Billaudelle et al., 2022, Billaudelle et al., 2019).
2. Software Stack, Experiment Workflow, and Configuration
The BrainScaleS-2 operating system adopts a layered approach for hardware control, configuration, and user access:
- Low-level communication: Abstracts reliable access over multiple protocols (Ethernet, SPI, JTAG) to registers and memory-mapped configuration structures on FPGA or chip (Müller et al., 2020).
- Hardware abstraction and coordinate system: Type-safe containers and coordinate abstractions represent neurons, synapses, analog/digital parameters, and entire chip topologies, exposing configuration via both C++ and Python (pybind11) APIs (Müller et al., 2022, Müller et al., 2020).
- Calibration and tuning: Automated Python-based routines flatten analog mismatch across neurons and synapses by adjusting on-chip DAC settings and monitoring via on-chip ADCs (CADC/MADC). Calibrations cover time constants, gain, and spike thresholds, achieving residual variability typically under 10% (Billaudelle et al., 2022, Weis et al., 2020).
- Experiment workflow: Configuration, execution, and data collection are managed in timed program sections. Batch, hardware-in-the-loop, and fully closed-loop sessions support rapid prototyping and experiment acceleration. PyNN (for SNNs) and hxtorch/hxtorch.snn (for both ANNs and SNNs) provide high-level programmatic interfaces (Spilger et al., 2022, Spilger et al., 2020, Müller et al., 2022).
- Partitioned Emulation: For models exceeding single-chip capacity, partitioned/sequential emulation allows large SNNs to be mapped and emulated layer-wise or subnetwork-wise. Spikes are recorded after each subnetwork and replayed for downstream stages, supporting hardware-in-the-loop training of deep models (Arnold et al., 2024).
3. Plasticity, Learning, and On-Chip Adaptation
BrainScaleS-2 implements a hybrid plasticity architecture:
- Local analog correlation sensors: Each synapse accumulates STDP eligibility traces in dedicated analog circuitry. CADCs digitize these traces for software or processor-based learning rule evaluation (Pehle et al., 2022, Grübl et al., 2020).
- Programmable on-chip PPUs: Both classic STDP, reward-modulated STDP (R-STDP), structural plasticity (sparse rewiring), and homeostatic rules are implemented using PPU firmware, which reads digitized traces, computes updates, rewrites synapse weights, and can rewire connections locally and efficiently (Billaudelle et al., 2019, Pehle et al., 2022, Atoui et al., 2024).
- Continuous calibration and adaptation: On-the-fly tuning of analog parameters compensates for device mismatch and parameter drift. Reinforcement learning and activity-driven self-calibration exploit the plasticity pipeline to ensure consistent network operation despite physical variability (Billaudelle et al., 2019).
- Hardware-in-the-loop training: Both spiking (using surrogate gradient BPTT) and non-spiking networks employ hardware-observed membrane traces and spike events in gradient calculations, closing the loop between forward analog emulation and host-side or on-chip learning (Spilger et al., 2022, Weis et al., 2020, Atoui et al., 2024).
4. Performance, Real-Time Acceleration, and Scalability
- Time constants & acceleration: Analog parameters are tuned to yield membrane and synaptic time constants in the 1–1000 μs range, corresponding to acceleration factors of ~10³ relative to biological dynamics. Full experiments such as closed-loop navigation or SNN inference tasks complete in sub-millisecond wall-clock time, corresponding to seconds–minutes of biological time (Pehle et al., 2022, Billaudelle et al., 2019, Schreiber et al., 2023).
- Throughput and latency: Single-chip inference and learning tasks achieve throughputs of 20–80 kyruns/s for standard benchmarks (e.g., ≈48 μs/sample for MNIST TTFS classification with 96.9% accuracy), with power dissipation of ~100–200 mW in inference mode. Analog matrix–vector multiplies for ANN workloads achieve 3–15 GOp/s per chip at ~10–15 pJ/op (Göltz et al., 2019, Stradmann et al., 2021, Spilger et al., 2020).
- Multi-chip, wafer-scale scaling: Physical network models larger than the fixed substrate (~512 neurons per chip) are realized via multi-chip assemblies. The system supports both inter-chip event routing (via custom EXTOLL networking infrastructure or aggregator-based MGT crossbars) and sequential partitioned emulation (Thommes et al., 2022, Ilmberger et al., 3 Dec 2025, Arnold et al., 2024).
Platform Link Bandwidth Latency (per hop) Aggregate Capacity EXTOLL 12 x 8.4 Gb/s 100–200 ns Up to 100.8 Gb/s per link Aggregator 12 x 5 Gb/s (8b/10b) 0.3–0.6 μs per FPGA 3 Gspikes/s per 12-chip backplane Inter-chip latency: Measured end-to-end single-spike transfer between chips in multi-chip configurations is sub-10 μs with EXTOLL (8 μs, 0.5 μs jitter per link), and below 1.3 μs per hop using backplane aggregator units. The effective network is fully deterministic and supports scalable, low-latency, high-throughput SNN emulation up to the rack-scale (hundreds of chips) (Thommes et al., 2022, Ilmberger et al., 3 Dec 2025).
5. System-Level Applications and Demonstrations
The BrainScaleS-2 substrate enables a spectrum of advanced use cases:
- Neuromorphic deep learning: Time-to-first-spike coding and TTFS backpropagation have been demonstrated with near-software accuracy performance for MNIST classification (96.9%), leveraging both large-scale analog networks and hardware-in-the-loop gradient calculations. Direct, energy-efficient analog matrix-multiply supports rapid ANN inference with minimal accuracy degradation (e.g., 98.0% for hardware-trained MNIST, 477 MOPS throughput on mobile systems) (Göltz et al., 2019, Weis et al., 2020, Stradmann et al., 2021).
- Embodied cognition and robotics: Closed-loop experiments include on-chip reinforcement learning for Pong, real-time bee-inspired path integration with spike-based neural networks emulated at 1 000× biological speed, and direct analog sound localization tasks with continuous sensor injection—demonstrating the platform’s ability to emulate closed-loop agents with μs-level latencies (Schreiber et al., 2020, Schreiber et al., 2023, Stradmann et al., 4 Feb 2026).
- Flexible plasticity and adaptation: Calcium-based and multi-timescale synaptic tagging-and-capture (STC) plasticity models have been split between analog and digital domains, validating long-term plasticity protocol emulation across extended time horizons. Structure-aware rewiring, activity-driven resource allocation, and three-factor learning protocols are implemented efficiently on the on-chip processors (Atoui et al., 2024, Billaudelle et al., 2019, Pehle et al., 2022).
- Scalable software integration: PyTorch and PyNN front-ends (hxtorch, hxtorch.snn) facilitate hardware-in-the-loop SNN and ANN training, including automatic network partitioning, batch scheduling, and seamless switching between hardware and software emulation (Spilger et al., 2022, Spilger et al., 2020).
6. Comparative Analysis and Limitations
- Analog vs. digital neuromorphic systems: BrainScaleS-2 distinguishes itself through continuous-time, sub-ms analog neural dynamics, dense in-memory analog computation, and true on-chip, custom-programmable plasticity. Benchmarks report energy-per-operation in the 10–100 pJ/MAC regime, and classification energy of 8.4 μJ/sample (MNIST TTFS) (Göltz et al., 2019, Weis et al., 2020). In contrast, fully digital systems (Loihi, TrueNorth, SpiNNaker) offer higher network sizes but operate at lower acceleration and require time-discrete simulation, sacrificing fine-grained biological mimicry and continuous dynamical ranges (Pehle et al., 2022).
- Substrate capacity and quantization: The primary architectural bottlenecks are fixed neuron and fan-in counts per ASIC, 6-bit weight quantization, input/output bandwidth, and analog mismatch (addressed by calibration and hardware-in-the-loop techniques). Sequential partitioning allows larger logical networks, but with overhead due to recording/replay interface (Arnold et al., 2024, Spilger et al., 2022).
- Variability and calibration demands: Device mismatch, static parameter spread, and analog thermal drift necessitate explicit, routine calibration. The platform provides per-neuron, per-synapse correction routines to ensure consistent mapping across experiments and chips (Billaudelle et al., 2022, Weis et al., 2020).
7. Future Directions and Scalability Outlook
Expansion to full wafer-scale (hundreds of chips per wafer) and rack-scale assemblies with hierarchical interconnects is planned, leveraging EXTOLL and multi-backplane architectures to enable experiments at the scale of 10⁶–10⁷ neurons and 10⁹ synapses in real-time–accelerated conditions. Ongoing developments target dynamic on-FPGA event routing, further reductions in inter-chip latency, real-time multicast support, expanded plasticity primitives, and direct integration with event-based sensory front-ends (Thommes et al., 2022, Ilmberger et al., 3 Dec 2025, Stradmann et al., 4 Feb 2026).
Planned software advances cover improved graph-based model description, automatic experiment serialization, sustainable hardware/software co-development, and enhanced calibration pipelines. Integration into distributed computation frameworks (EBRAINS Collaboratory, BindsNET, norse) provides broad accessibility for computational neuroscience and machine learning practitioners (Müller et al., 2022, Müller et al., 2020, Spilger et al., 2022).
BrainScaleS-2’s confluence of analog acceleration, neuromorphic plasticity, embedded digital programmability, and scalable software infrastructure positions it as a general-purpose substrate for both exploratory neuroscience and efficient real-world machine learning experimentation (Schemmel et al., 2020, Pehle et al., 2022, Spilger et al., 2022, Ilmberger et al., 3 Dec 2025).