Spiking Music: Audio Compression with Event Based Auto-encoders (2402.01571v1)
Abstract: Neurons in the brain communicate information via punctual events called spikes. The timing of spikes is thought to carry rich information, but it is not clear how to leverage this in digital systems. We demonstrate that event-based encoding is efficient for audio compression. To build this event-based representation we use a deep binary auto-encoder, and under high sparsity pressure, the model enters a regime where the binary event matrix is stored more efficiently with sparse matrix storage algorithms. We test this on the large MAESTRO dataset of piano recordings against vector quantized auto-encoders. Not only does our "Spiking Music compression" algorithm achieve a competitive compression/reconstruction trade-off, but selectivity and synchrony between encoded events and piano key strikes emerge without supervision in the sparse regime.
- A unified, scalable framework for neural population decoding. arXiv preprint arXiv:2310.16046, 2023.
- End-to-end optimized image compression. arXiv preprint arXiv:1611.01704, 2016.
- Long short-term memory and learning-to-learn in networks of spiking neurons. Advances in neural information processing systems, 31, 2018.
- A solution to the learning dilemma for recurrent networks of spiking neurons. Nature communications, 11(1):3625, 2020.
- Fitting summary statistics of neural data with a differentiable spiking network simulator. Advances in Neural Information Processing Systems, 34:18552–18563, 2021.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- I. BS. 1534-3,“method for the subjective assessment of intermediate quality level of audio systems,”. International Telecommunication Union, Geneva, Switzerland, 2015.
- Toward a unified theory of efficient, predictive, and sparse coding. Proceedings of the National Academy of Sciences, 115(1):186–191, 2018.
- Single channel voice separation for unknown number of speakers under reverberant and noisy settings. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3730–3734. IEEE, 2021.
- Temporal coding in spiking neural networks with alpha synaptic function: learning with backpropagation. IEEE transactions on neural networks and learning systems, 33(10):5939–5952, 2021a.
- Spiking autoencoders with temporal coding. Frontiers in neuroscience, 15:712667, 2021b.
- Simple and controllable music generation. arXiv preprint arXiv:2306.05284, 2023.
- Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.
- The heidelberg spiking data sets for the systematic evaluation of spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(7):2744–2757, 2020.
- High fidelity neural audio compression. arXiv preprint arXiv:2210.13438, 2022.
- Language modeling is compression. arXiv preprint arXiv:2309.10668, 2023.
- Binary coding of speech spectrograms using a deep auto-encoder. In Eleventh annual conference of the international speech communication association, 2010.
- Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
- The challenge of realistic music generation: modelling raw audio at scale. Advances in Neural Information Processing Systems, 31, 2018.
- Variable-rate discrete representation learning. arXiv preprint arXiv:2103.06089, 2021.
- Yale sparse matrix package. i. the symmetric codes. Technical report, YALE UNIV NEW HAVEN CT DEPT OF COMPUTER SCIENCE, 1977.
- Convolutional networks for fast, energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences, 113(41):11441–11446, Sept. 2016. doi: 10.1073/pnas.1604850113. URL https://doi.org/10.1073/pnas.1604850113.
- Low bit-rate speech coding with vq-vae and a wavenet decoder. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 735–739. IEEE, 2019.
- Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press, 2014.
- Enabling factorized piano music modeling and generation with the MAESTRO dataset. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=r1lYRjC9F7.
- J. Ho and T. Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- H. Kajino. A differentiable point process with its application to spiking neural networks. In International Conference on Machine Learning, pages 5226–5235. PMLR, 2021.
- Q. Liang and Y. Zeng. Stylistic composition of melodies based on a brain-inspired spiking neural network. Frontiers in systems neuroscience, 15:639484, 2021.
- Y. Luo and N. Mesgarani. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM transactions on audio, speech, and language processing, 27(8):1256–1266, 2019.
- W. Maass. On the computational power of noisy spiking neurons. Advances in neural information processing systems, 8, 1995.
- Fitting new speakers based on a short untranscribed sample. In International conference on machine learning, pages 3683–3691. PMLR, 2018.
- Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
- The emergence of multiple retinal cell types through efficient coding of natural movies. Advances in Neural Information Processing Systems, 31, 2018.
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996.
- Neocortex saves energy by reducing coding precision during food scarcity. Neuron, 110(2):280–296, 2022.
- An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks. Frontiers in neuroscience, 13:1420, 2020.
- Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10619–10629, 2022.
- Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. i. response characteristics. Journal of neurophysiology, 57(1):132–146, 1987.
- A. H. Robinson and C. Cherry. Results of a prototype television bandwidth compression scheme. Proceedings of the IEEE, 55(3):356–364, 1967.
- Mousai: Text-to-music generation with long-context latent diffusion. arXiv preprint arXiv:2301.11757, 2023.
- Utilizing the neuronal behavior of spiking neurons to recognize music signals based on time coding features. IEEE Access, 10:37317–37329, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Transactions on Image Processing, 27(7):3210–3221, 2018.
- Trial matching: capturing variability with data-constrained spiking neural networks. Advances in neural information processing systems, 36, 2023.
- High-performance deep spiking neural networks with 0.3 spikes per neuron. Neural Networks, 168:74–88, 2023a.
- Are training trajectories of deep single-spike and deep relu network equivalent? arXiv preprint arXiv:2306.08744, 2023b.
- Efficient recurrent architectures through activity sparsity and sparse back-propagation through time. In The Eleventh International Conference on Learning Representations, 2022.
- Seanet: A multi-modal speech enhancement network. arXiv preprint arXiv:2009.02095, 2020.
- Speed of processing in the human visual system. nature, 381(6582):520–522, 1996.
- The intel neuromorphic dns challenge. Neuromorphic Computing and Engineering, 3(3):034005, 2023.
- Optimal population coding by noisy spiking neurons. Proceedings of the National Academy of Sciences, 107(32):14419–14424, 2010.
- Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
- T. C. Wunderlich and C. Pehle. Event-based backpropagation can compute exact gradients for spiking neural networks. Scientific Reports, 11(1):12829, 2021.
- Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021.