SteganoSNN: Neuromorphic Audio-in-Image Steganography

Updated 16 November 2025

SteganoSNN is a neuromorphic steganographic framework that converts 16-bit PCM audio into LIF neuron–generated spike trains for secure multimedia data hiding.
It employs a modulo-based encryption scheme and LSB embedding with dithering, achieving high capacity (8 bpp) and superior fidelity compared to deep-learning approaches.
Implemented in Python and on FPGA, SteganoSNN offers real-time performance in Edge-AI and IoT applications with robust resistance to steganalysis.

SteganoSNN is a neuromorphic steganographic framework that applies spiking neural networks (SNNs) and lightweight encryption to achieve secure, efficient, and high-capacity multimedia data hiding. Designed for audio-in-image steganography, SteganoSNN converts digitized audio into LIF neuron–derived spike trains, encrypts the resulting features via a modulo mapping scheme, and robustly embeds these at high bit rates into least significant bits (LSBs) of RGBA images using dithering for perceptual fidelity. Implemented in Python with NEST for neuron simulation and realized in hardware on the PYNQ-Z2 FPGA, SteganoSNN demonstrates real-time operation and quantitative superiority over deep-learning-based steganographic approaches in both embedding capacity and image quality (Sahoo et al., 9 Nov 2025).

1. Audio-to-Spike Conversion with Leaky Integrate-and-Fire Neurons

SteganoSNN initiates encoding by transforming 16-bit signed PCM audio samples ( $s\in[-32768,32767]$ ) into spike train patterns generated by a Leaky Integrate-and-Fire (LIF) neuron model. The neuron membrane evolution is described by

$C_m\,\frac{dV(t)}{dt} = -g_L\left(V(t)-E_L\right) + I_{\text{inj}}(t)$

with parameters:

$C_m = 250\,\mathrm{pF}$ , $\tau_m = 10\,\mathrm{ms}\ \Rightarrow\ g_L = \frac{C_m}{\tau_m}$
$E_L = 0\,\mathrm{mV}$
Spike threshold $V_{\mathrm{th}} = 20\,\mathrm{mV}$
Reset $V_{\text{reset}} = 10\,\mathrm{mV}$ , initial $V(0) = -70\,\mathrm{mV}$
Absolute refractory $t_{\text{ref}} = 2\,\mathrm{ms}$

Audio sample encoding proceeds as follows:

For each sample, $I_{\text{inj}}$ is swept from 370 pA upward in 1 pA steps for a 60 ms window.
The total spike count (0–9) identifies the represented decimal digit.
The precise spike times $\{t_1, t_2, \dots, t_N\}$ within this window encode additional information.

This mapping is established via NEST 3.9 simulations, where each distinct spike count and pattern uniquely represent a digit for further secure embedding. The approach exploits the deterministic mapping from input current to spike timing, allowing a reproducible transform from audio digits to temporal SNN patterns.

2. Modulo-Based Encryption Scheme

After conversion, each 16-bit audio sample is decomposed into a sign bit $\sigma \in \{0,1\}$ and five decimal digits $d_1, \dots, d_5$ . For each digit $d$ , a characteristic spike time $t_{d,k} \in \{17, 23, 29, \dots, 59\}$ is chosen, and encryption is performed using

$r_d = t_{d,k} \bmod 16\,, \qquad \text{KEY}_d = k-1$

where $r_d$ is the 4-bit ciphertext of digit $d$ , and $\text{KEY}_d$ is the corresponding 4-bit key indexing the spike time.

For example, for digit 6 with available spike times $\{20,28,36,44,52,59\}$ , selecting $t=44$ results in $(r_6, \text{KEY}_6) = (12,3)$ .

Each audio sample yields six pairs $(r, \text{KEY})$ ; only the $r$ values (steganographic payload) are embedded, while key values are transmitted or stored separately for decryption. This scheme achieves one-time pad–like security without computationally intensive cryptography, relying on secure key exchange. Digit 0 (no spike) trivially maps to $(0,0)$ .

3. LSB Embedding with Dithering in RGBA Images

SteganoSNN encodes the $r$ values into cover images stored as PNGs with 8-bit RGBA channels. By altering the two least significant bits in each channel per pixel, an embedding capacity of 8 bits per pixel (bpp) is achieved. To prevent perceptually detectable artifacts (blockiness), dithering noise $\delta\in\{0,1,2\}$ is introduced before embedding:

$p' = \mathrm{clip}(p + \delta,\ 0,255)$

$p_{\text{stego}} = \left\lfloor \frac{p'}{4} \right\rfloor \times 4 + b_1\cdot2 + b_0$

where $p$ is the original pixel intensity and $(b_1, b_0)$ are the payload bits. The constraint $\delta \leq 2$ ensures visually imperceptible distortion, while effective dithering suppresses quantization error and maintains gradient smoothness.

This LSB substitution method with dithering operates losslessly for cover image content while enabling near-theoretical channel capacity, outperforming many GAN-based architectures in both efficiency and imperceptibility.

4. System Implementation: Software and Hardware

SteganoSNN is realized both in software and FPGA hardware, enabling application in Edge-AI and low-power scenarios:

Software Pipeline:

Implemented in Python using NEST 3.9.
SNN_patterns.py simulates LIF neurons and extracts spike-time patterns.
Discretelevel.py calibrates current intervals per digit.
Key_and_Map.py establishes the lookup between remainder values and digits.
Image_analysis.py pre-processes RGBA compliance.
encrypt.py and decrypt.py manage end-to-end digitization, encryption, dithering, embedding, and extraction.

FPGA Co-Design:

Target hardware: PYNQ-Z2 Board (Artix-7 PL, dual-core ARM Cortex-A9 PS).
The PS runs NEST simulation and manages data streaming over AXI-DMA.
The PL hosts two custom Verilog IP cores:
- Encryptor: BCD conversion, pseudo-noise dithering, modulo-16 mapping, two-bit LSB embedding.
- Decryptor: LSB extraction, inverse mapping, KEY-guided digit reconstruction.
Both encryptor and decryptor operate on 32-bit AXI-Stream data; overlays are dynamically loaded.
Full real-time support for images up to 1920×1080 and stereo 48 kHz audio.

FPGA Resource Utilization (Artix-7, 100 MHz PL)

Resource	Encryptor	%	Decryptor	%
LUTs	5,185	9.8	34,768	65
LUTRAM	185	1.1	185	1.1
FFs	4,266	4.0	3,909	3.7
BRAM18K	3	2.1	3	2.1
BUFG	2	6.3	1	3.1

5. Quantitative Evaluation and Comparative Analysis

Evaluations on DIV2K 2017 (train/valid, various upsampling factors) demonstrate the following performance:

Embedding Capacity: 8 bpp ( $\text{BPP} = 8$ ), outperforming SteganoGAN’s 2–5 bpp.
Fidelity Metrics:
- PSNR $_{\text{RGB}}$ : 40.42–41.35 dB
- SSIM $_{\text{RGB}}$ : $>$ 0.9693 (up to 0.9801)
- PSNR $_{\text{RGBA}}$ : 40.48–41.10 dB, SSIM $_{\text{RGBA}} > 0.97$
Compared to SteganoGAN: Dense models at 2.44–2.63 bpp yield PSNR $\sim$ 36.5–41.6 dB, SSIM $\sim$ 0.85–0.95.

Thus, SteganoSNN achieves higher capacity (8 bpp) and preserves superior or comparable fidelity (PSNR $>$ 40 dB, SSIM $>$ 0.97) even at peak rates.

Steganalysis Resistance (Aletheia tool):
- $\text{SPA}_\text{G}$ : [0.3362, 0.7603]
- $\text{Triples}_\text{G}$ : 0.7368
- $\text{WS}_\text{G}$ : [1.1176, 2.0519]

These values indicate statistical imperceptibility of payload even at maximum embedding rate, confirming robustness to standard detection techniques.

6. Applications and Significance

SteganoSNN’s biologically inspired approach enables secure, energy-efficient audio-in-image steganography with high channel capacity, real-time throughput, and minimal resource consumption. By operating efficiently on both conventional (Python, NEST) and low-power embedded (FPGA) platforms, SteganoSNN is well-suited for applications in Edge-AI, IoT, and biomedical domains where on-device privacy, low latency, and resilience to steganalysis are critical.

This paradigm establishes a foundation for neuromorphic steganography, combining temporal neural coding, lightweight encryption, and constrained device compatibility, distinguishing it from GAN-based steganography in capacity, computational efficiency, and implementation versatility. SteganoSNN’s modular structure also facilitates future exploration of alternative neural coding schemes and cryptographic primitives.

PDF Markdown Chat (Pro)

References (1)

SteganoSNN: SNN-Based Audio-in-Image Steganography with Encryption (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to SteganoSNN.