Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sound Source Localization for a Source inside a Structure using Ac-CycleGAN

Published 8 Dec 2023 in cs.SD and eess.AS | (2312.04846v1)

Abstract: We propose a method for sound source localization (SSL) for a source inside a structure using Ac-CycleGAN under unpaired data conditions. The proposed method utilizes a large amount of simulated data and a small amount of actual experimental data to locate a sound source inside a structure in a real environment. An Ac-CycleGAN generator contributes to the transformation of simulated data into real data, or vice versa, using unpaired data from both domains. The discriminator of an Ac-CycleGAN model is designed to differentiate between the transformed data generated by the generator and real data, while also predicting the location of the sound source. Vectors representing the frequency spectrum of the accelerometers (FSAs) measured at three points outside the structure are used as input data and the source areas inside the structure are used as labels. The input data vectors are concatenated vertically to form an image. Labels are defined by dividing the interior of the structure into eight areas with one-hot encoding for each area. Thus, the SSL problem is redefined as an image-classification problem to stochastically estimate the location of the sound source. We show that it is possible to estimate the sound source location using the Ac-CycleGAN discriminator for unpaired data across domains. Furthermore, we analyze the discriminative factors for distinguishing the data. The proposed model exhibited an accuracy exceeding 90\% when trained on 80\% of actual data (12.5\% of simulated data). Despite potential imperfections in the domain transformation process carried out by the Ac-CycleGAN generator, the discriminator can effectively distinguish between transferred and real data by selectively utilizing only those features that generate a relatively small transformation error.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. C. Knapp, G. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Audio, Speech Lang. Process. 24 (1976) 320–327.
  2. G. C. Carter, Coherence and time delay estimation, Proc. IEEE 75 (1987) 236–255.
  3. Sound localization and quantification analysis of an automotive engine cooling module, J. Sound Vib. 517 (2022) 116534.
  4. Non-negative intensity for a heavy fluid-loaded stiffened plate, J. Sound Vib. 566 (2023) 117891.
  5. A survey of sound source localization with deep learning methods, J. Acoust. Soc. Am. 152 (2022) 107–151.
  6. Localization of broadband acoustical sources in the cylindrical duct via measurements outside the duct end, J. Sound Vib. (2023) 117749.
  7. Direction of arrival estimation of an acoustic wave using a single structural vibration sensor, J. Sound Vib. 553 (2023) 117671.
  8. S. Kita, Y. Kajikawa, Fundamental study on sound source localization inside a structure using a deep neural network and computer-aided engineering, J. Sound Vib. 513 (2021) 116400.
  9. Direction of arrival estimation of noisy speech using convolutional recurrent neural networks with higher-order ambisonics signals, in: EUSIPCO, 2021, pp. 211–215.
  10. Adaptation of multiple sound source localization neural networks with weak supervision and domain-adversarial training, in: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2019, pp. 770–774.
  11. R. Takeda, K. Komatani, Unsupervised adaptation of deep neural networks for sound source localization using entropy minimization, in: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2017, pp. 2217–2221.
  12. Unsupervised adaptation of neural networks for discriminative sound source localization with eliminative constraint, in: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2018, pp. 3514–3518.
  13. Neural network adaptation and data augmentation for multi-speaker direction-of-arrival estimation, IEEE Trans. Audio, Speech Lang. Process. 29 (2021) 1303–1317.
  14. H. Shimodaira, Improving predictive inference under covariate shift by weighting the log-likelihood function, J. Stat. Plan. 90 (2000) 227–244.
  15. A unifying view on dataset shift in classification, Pattern Recognit. 45 (2012) 521–530.
  16. A survey on deep transfer learning, in: ICANN, 2018, pp. 270–279.
  17. M. Wang, W. Deng, Deep visual domain adaptation: A survey, Neurocomputing 312 (2018) 135–153.
  18. Adversarial discriminative domain adaptation, in: CVPR, 2017, pp. 7167–7176.
  19. Cycada: Cycle-consistent adversarial domain adaptation, in: ICML, 2018, pp. 1989–1998.
  20. Maximum classifier discrepancy for unsupervised domain adaptation, in: CVPR, 2018, pp. 3723–3732.
  21. Domain-adversarial neural networks, arXiv preprint arXiv:1412.4446 (2014).
  22. I. H. Laradji, R. Babanezhad, M-adda: Unsupervised domain adaptation with deep metric learning, arXiv preprint arXiv:1807.02552 (2018).
  23. G. Wilson, D. J. Cook, A survey of unsupervised deep domain adaptation, ACM Trans. Intell. Syst. Technol. 11 (2020) 1–46.
  24. S. Kita, Y. Kajikawa, Study on sound source localization inside a structure using a domain transfer model for real-world adaption of a trained model, in: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, 6, Institute of Noise Control Engineering, 2023, pp. 1239–1248.
  25. Unpaired image-to-image translation using cycle-consistent adversarial networks, in: ICCV, 2017, pp. 2223–2232.
  26. Image-to-image translation with conditional adversarial networks, in: CVPR, 2017, pp. 1125–1134.
  27. Generative adversarial nets, NeurIPS 27 (2014).
  28. M. Mirza, S. Osindero, Conditional generative adversarial nets, arXiv preprint arXiv:1411.1784 (2014).
  29. Improving face sketch recognition via adversarial sketch-photo transformation, in: FG, 2019, pp. 1–8.
  30. Expression conditional gan for facial expression-to-expression translation, in: ICIP, 2019, pp. 4449–4453.
  31. Voice conversion using conditional cyclegan, in: CSCI, 2018, pp. 1460–1461.
  32. Many-to-many voice conversion using conditional cycle-consistent adversarial networks, in: Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2020, pp. 6279–6283.
  33. Conditional image synthesis with auxiliary classifier gans, in: ICML, 2017, pp. 2642–2651.
  34. Foodchangelens: Cnn-based food transformation on hololens, in: AIVR, 2018, pp. 197–199.
  35. Food category transfer with conditional cyclegan and a large-scale food image dataset, in: MADiMa, 2018, pp. 67–70.
  36. Using photorealistic face synthesis and domain adaptation to improve facial expression analysis, in: FG, 2019, pp. 1–8.
  37. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation, in: CVPR, 2018, pp. 8789–8797.
  38. Least squares generative adversarial networks, in: ICCV, 2017, pp. 2794–2802.
  39. T. Dare, Experimental force reconstruction using a neural network and simulated training data, in: INTER-NOISE, 2020, pp. 4995–5868.
  40. T. Dare, Experimental force reconstruction on plates of arbitrary shape using neural networks, in: INTER-NOISE, 2021, pp. 2949–3943. doi:10.3397/IN2021-2397.
  41. Deep residual learning for image recognition, in: CVPR, 2016, pp. 770–778.
  42. Deconvolutional networks, in: CVPR, 2010, pp. 2528–2535.
  43. Instance normalization: The missing ingredient for fast stylization, arXiv preprint arXiv:1607.08022 (2016).
  44. Rectifier nonlinearities improve neural network acoustic models, in: ICML, volume 30, 2013, p. 3.
  45. Random erasing data augmentation, in: AAAI, volume 34, 2020, pp. 13001–13008.
  46. T. DeVries, G. W. Taylor, Improved regularization of convolutional neural networks with cutout, arXiv preprint arXiv:1708.04552 (2017).
  47. L. Van der Maaten, G. Hinton, Visualizing data using t-sne., JMLR 9 (2008).
  48. Grad-cam: Visual explanations from deep networks via gradient-based localization, in: ICCV, 2017, pp. 618–626.
  49. S. Kita, Y. Kajikawa, Sound source localization inside a structure under semi-supervised conditions, IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023) 1397–1408.
  50. E. Parzen, On estimation of a probability density function and mode, Ann. Math. Stat. 33 (1962) 1065–1076.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.