Spiking Music: Audio Compression with Event Based Auto-encoders (2402.01571v1)

Published 2 Feb 2024 in cs.SD, cs.LG, cs.NE, and eess.AS

Abstract: Neurons in the brain communicate information via punctual events called spikes. The timing of spikes is thought to carry rich information, but it is not clear how to leverage this in digital systems. We demonstrate that event-based encoding is efficient for audio compression. To build this event-based representation we use a deep binary auto-encoder, and under high sparsity pressure, the model enters a regime where the binary event matrix is stored more efficiently with sparse matrix storage algorithms. We test this on the large MAESTRO dataset of piano recordings against vector quantized auto-encoders. Not only does our "Spiking Music compression" algorithm achieve a competitive compression/reconstruction trade-off, but selectivity and synchrony between encoded events and piano key strikes emerge without supervision in the sparse regime.

References (55)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel method using a binary spiking auto-encoder that leverages high sparsity for efficient audio encoding.
It replaces conventional VQ mechanisms with an end-to-end trainable model that achieves competitive reconstruction quality on the MAESTRO dataset.
The study reveals emergent piano note selectivity, highlighting potential for energy-efficient neuromorphic computing in audio compression.

Event-based Audio Compression Exploiting Sparsity for Efficient Encoding

Introduction to Spiking Music Compression

Recent advances in deep learning have revitalized the exploration of neural network architectures for audio compression. However, the predominant approach employs vector quantized variational autoencoders (VQ-VAE), which do not inherently leverage the concept of event-based encoding—a principle deeply rooted in biological neural systems. This paper introduces a novel algorithm, coined as "Spiking Music compression," utilizing an event-based auto-encoder model for the task of audio compression. By replacing the VQ mechanism with a binary spiking model and imposing high sparsity pressure, the algorithm not only demonstrates competitive audio reconstruction quality but also inherent efficiency in encoding, storing, and transmitting musical data. Tested on the MAESTRO dataset, this approach significantly reduces the bit-rate required for piano recordings, offering a fresh perspective on digital compression techniques.

Novel Concept and Implementation

The cornerstone of this method is the development of a deep binary auto-encoder that transforms audio signals into a sparse binary matrix. Unlike conventional methods:

Free Model: It foregoes the requisite for a pre-defined codebook, relying instead on an end-to-end trainable model that outputs binary representations directly. In the absence of an auxiliary sparsity loss, this model competes with existing VQ-VAE techniques in audio fidelity.
Sparse Model: Incorporates sparsity-inducing loss to push the representation towards utilizing fewer bits, making the resultant binary matrix suitable for storage using sparse matrix algorithms. This model not only proves the viability of extremely low bit-rate compression but also reveals an emergent behavior of unit selectivity to specific piano notes without explicit supervision.

Theoretical Contributions and Practical Outcomes

A key theoretical insight from this research is the manifestation of selectivity and synchrony with piano key strikes in the sparse regime, suggesting that the model can uncover underlying musical patterns and events—a property not observed in the absence of sparsity pressure. This finding underpins the potential of spiking neural networks (SNNs) in encoding and compressing complex auditory signals beyond simple digitized data, providing a bridge to understanding how event-based systems can be harnessed effectively in computational models.

Forward-Looking Perspectives

The implications of this paper extend beyond the domain of audio compression, touching on broader aspects of efficient computing and neural network architecture design:

Energy Efficiency: The adoption of sparse, event-based encoding mirrors the energy-saving strategies observed in biological systems. In an era where computational efficiency, particularly in terms of energy consumption, is paramount, spiking models offer a lucrative avenue for the development of hardware and algorithms designed for sustainability.
Neuromorphic Computing: The investigation brings to light the untapped potential of SNNs in practical applications. The demonstrated computational advantages in audio compression set a precedent for leveraging spiking models in various tasks, potentially catalyzing advancements in neuromorphic computing landscapes.
Future Research Pathways: While the paper capitalizes on discrete time models, the exploration of continuous-time or event-driven frameworks for audio and other data types appears promising. Such approaches could further refine the efficiency and effectiveness of neural encoding systems.

Concluding Remarks

This research introduces a paradigm shift in neural audio compression, advocating for the exploration of sparsity and event-based encoding as means to achieve efficiency and performance. By demonstrating the practical utility of spiking models in compressing musical audio—and the emergent properties of such models—the paper sets the groundwork for future explorations into energy-efficient, neuromorphic computing technologies and beyond.

Tweets

https://twitter.com/BellecGuill/status/1756324360749502838

https://twitter.com/ArxivSound/status/1754526216873218545