Projected Belief Networks With Discriminative Alignment for Acoustic Event Classification: Rivaling State of the Art CNNs (2401.11199v1)
Abstract: The projected belief network (PBN) is a generative stochastic network with tractable likelihood function based on a feed-forward neural network (FFNN). The generative function operates by "backing up" through the FFNN. The PBN is two networks in one, a FFNN that operates in the forward direction, and a generative network that operates in the backward direction. Both networks co-exist based on the same parameter set, have their own cost functions, and can be separately or jointly trained. The PBN therefore has the potential to possess the best qualities of both discriminative and generative classifiers. To realize this potential, a separate PBN is trained on each class, maximizing the generative likelihood function for the given class, while minimizing the discriminative cost for the FFNN against "all other classes". This technique, called discriminative alignment (PBN-DA), aligns the contours of the likelihood function to the decision boundaries and attains vastly improved classification performance, rivaling that of state of the art discriminative networks. The method may be further improved using a hidden Markov model (HMM) as a component of the PBN, called PBN-DA-HMM. This paper provides a comprehensive treatment of PBN, PBN-DA, and PBN-DA-HMM. In addition, the results of two new classification experiments are provided. The first experiment uses air-acoustic events, and the second uses underwater acoustic data consisting of marine mammal calls. In both experiments, PBN-DA-HMM attains comparable or better performance as a state of the art CNN, and attain a factor of two error reduction when combined with the CNN.
- Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint discriminative and generative learning for person re-identification,” in CVPR 2019, Long Beach, CA, June 2019, pp. 2133–2142.
- J. Gordon and J. M. Hernández-Lobato, “Combining deep generative and discriminative models for bayesian semi-supervised learning,” Pattern Recognition, vol. 100, 2020.
- Z. Tu, “Learning generative models via discriminative approaches,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.
- R. Raina, Y. Shen, A. Y. Ng, and A. McCallum, “Classification with hybrid generative/discriminative models,” in NIPS 2003, Vancouver and Whistler British Columbia, Canada, December 2003.
- H. Liu and P. Abbeel, “Hybrid discriminative-generative training via contrastive learning,” ArXiv, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.2007.09070
- T. Jaakkola and D. Haussler, “Exploiting generative models in discriminative classifiers,” in NIPS 1998, Denver, 1998.
- M. Schuster and K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
- G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” in Neural Computation 2006, 2006.
- M. Welling, M. Rosen-Zvi, and G. Hinton, “Exponential family harmoniums with an application to information retrieval,” Advances in neural information processing systems, 2004.
- P. Li and P. Nguyen, “On random deep weight-tied autoencoders: Exact asymptotic analysis, phase transitions, and implications to training,” ICLR, 2019.
- S. Odaibo, “Tutorial: Deriving the standard variational autoencoder (vae) loss function,” 2019.
- C. Doersch, “Tutorial on variational autoencoders,” 2021. [Online]. Available: https://doi.org/10.48550/arXiv.1606.05908
- I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “Learning basic visual concepts with a constrained variational framework,” in Proceedings of ICLR 201, 2017, pp. 595–597.
- I. Kobyzev, S. J. D. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 3964–3979, 2021.
- D. Nielsen, P. Jaini, E. Hoogeboom, O. Winther, and M. Welling, “Survae flows: Surjections to bridge the gap between vaes and flows,” in NIPS 2020 (Virtual), 2020.
- P. M. Baggenstoss and F. Govaers, “A comparison of pdf projection with normalizing flows and survae,” arXiv, 2023.
- P. M. Baggenstoss, “The PDF projection theorem and the class-specific method,” IEEE Trans Signal Processing, pp. 672–685, March 2003.
- ——, “Beyond moments: Extending the maximum entropy principle to feature distribution constraints,” Entropy, vol. 20, no. 9, 2018. [Online]. Available: http://www.mdpi.com/1099-4300/20/9/650
- ——, “Maximum entropy PDF design using feature density constraints: Applications in signal processing,” IEEE Trans. Signal Processing, vol. 63, no. 11, Jun. 2015.
- P. M. Baggenstoss and S. Kay, “Nonlinear dimension reduction by pdf estimation,” IEEE Transactions on Signal Processing, 2022.
- P. M. Baggenstoss, “Uniform manifold sampling (UMS): Sampling the maximum entropy pdf,” IEEE Transactions on Signal Processing, vol. 65, no. 9, pp. 2455–2470, May 2017.
- ——, “Class-specific model mixtures for the classification of time-series,” 2014.
- ——, “Class-specific model mixtures for the classification of acoustic time-series,” IEEE Trans. AES, Aug. 2016.
- ——, “A neural network based on first principles,” in ICASSP 2020, Barcelona (virtual), Barcelona, Spain, Sep 2020.
- ——, “Using the projected belief network at high dimensions,” Proceedings of EUSIPCO 2022, Belgrade, 2022.
- ——, “Applications of projected belief networks (PBN),” Proceedings of EUSIPCO, A Corunã, Spain, 2019.
- ——, “Discriminative alignment of projected belief networks,” IEEE Signal Processing Letters, Sep 2021.
- ——, “On the equivalence of hanning-weighted and overlapped analysis windows using different window sizes,” IEEE Signal Processing Letters, vol. 19, no. 1, pp. 27–30, Jan 2012.
- Z. Zhu, S. Kay, and R. S. Raghavan, “Information-theoretic optimal radar waveform design,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 274–278, 2017.
- J.-P. Nadal and N. Parga, “Nonlinear neurons in the low-noise limit: a factorial code maximizes information transfer,” Network: Computation in Neural Systems, vol. 5, no. 4, pp. 565–581, 1994.
- D. Lin and X. Tang, “Conditional infomax learning: an integrated framework for feature extraction and fusion,” in European conference on computer vision. Springer, 2006, pp. 68–82.
- P. M. Baggenstoss, “On the duality between belief networks and feed-forward neural networks,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–11, 2018.
- E. T. Jaynes, “Information theory and statistical mechanics i,” Physical Review, p. 171–190, 1957.
- S. M. Kay, A. H. Nuttall, and P. M. Baggenstoss, “Multidimensional probability density function approximations for detection, classification, and model order selection,” IEEE Transactions on Signal Processing, vol. 49, no. 10, pp. 2240–2252, Oct 2001.
- S. J. Wernecke and L. R. D’Addario, “Maximum entropy image reconstruction,” IEEE Trans. Computers, vol. C-26, no. 4, pp. 351–364, 1977.
- G. Wei and H. Zhen-Ya, “A new algorithm for maximum entropy image reconstruction,” in Proceedings of ICASSP-87, vol. 12, April 1987, pp. 595–597.
- O. Barndorff-Nielsen and D. R. Cox, “Edgeworth and saddle-point approximations with statistical applications,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 41, no. 3, pp. 279–299, 1979.
- P. M. Baggenstoss, “Applications of projected belief networks (pbn),” in Proceedings of EUSIPCO 2019, La Coruña, Spain, Sep 2019.
- ——, “Improved auto-encoding using deterministic projected belief networks and compound activation functions,” Proceedings of EUSIPCO 2023, Helsinki, 2023.
- P. Baggenstoss, “PBN Toolkit,” accessed: 2023-10-28. [Online]. Available: http://class-specific.com/pbntk
- U. Bhattacharjee and K. Sarmah, “Gmm-ubm based speaker verification in multilingual environments,” IJCSI International Journal of Computer Science Issues, vol. 9, no. 2, pp. 373–380, 2012.
- T. Miyato and M. Koyama, “cGANs with projection discriminator,” in ICLR 2018. IEEE, 2018.
- P. M. Baggenstoss and K. Wilkinghoff, “Novel generative classifier for acoustic events (accepted),” Proceedings of EUSIPCO 2023, Helsinki, 2023.
- L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, February 1989.
- K. J. Piczak, “ESC: Dataset for Environmental Sound Classification,” 2015. [Online]. Available: https://doi.org/10.7910/DVN/YDEPUT
- P. Baggenstoss, “CSF Toolkit,” accessed: 2021-10-28. [Online]. Available: http://class-specific.com/csftk
- ——, “New restricted Boltzmann machines and deep belief networks for audio classification,” 2021 ITG Speech Communication, Kiel (Virtual), 2021.
- A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in 30th International Conference on Machine Learning (ICML), 2013.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 2016, pp. 770–778.
- K. Wilkinghoff and F. Fritz, “On using pre-trained embeddings for detecting anomalous sounds with limited training data,” in 31st European Signal Processing Conference (EUSIPCO). IEEE, 2023.
- K. Wilkinghoff, “Design choices for learning embeddings from auxiliary tasks for domain generalization in anomalous sound detection,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023.
- ——, “Sub-cluster AdaCos: Learning representations for anomalous sound detection,” in International Joint Conference on Neural Networks (IJCNN). IEEE, 2021.
- X. Zhang, R. Zhao, Y. Qiao, X. Wang, and H. Li, “AdaCos: Adaptively scaling cosine logits for effectively learning deep face representations,” in Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019, pp. 10 823–10 832.
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
- H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond empirical risk minimization,” in 6th International Conference on Learning Representations (ICLR), 2018.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations (ICLR), 2015.
- M. Abadi et al., “Tensorflow: A system for large-scale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, pp. 265–283.
- B. Miller, K. Stafford, I. Van Opzeeland, D. Harris, F. Samaran, A. šIrović, S. Buchan, K. Findlay, N. Balcazar, S. Nieukirk, E. Leroy, M. Aulich, F. Shabangu, R. Dziak, W. Lee, and J. Hong, “An annotated library of underwater acoustic recordings for testing and training automated algorithms for detecting antarctic blue and fin whale sounds,” in Australian Antarctic Data Centre, 2020. [Online]. Available: https://data.aad.gov.au/metadata/records/AcousticTrends_BlueFinLibrary
- P. Baggenstoss, “Selected events from acoustic trends blue fin data set,” accessed: 2023-10-28. [Online]. Available: http://class-specific.com/au6