BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network (2309.02836v2)

Published 6 Sep 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between real and fake data in the feature space. In the literature, it has been demonstrated that slicing adversarial network (SAN), an improved GAN training framework that can find the optimal projection, is effective in the image generation task. In this paper, we investigate the effectiveness of SAN in the vocoding task. For this purpose, we propose a scheme to modify least-squares GAN, which most GAN-based vocoders adopt, so that their loss functions satisfy the requirements of SAN. Through our experiments, we demonstrate that SAN can improve the performance of GAN-based vocoders, including BigVGAN, with small modifications. Our code is available at https://github.com/sony/bigvsan.

Citations (9)

View on Semantic Scholar

Summary

The paper introduces BigVSAN, a novel slicing adversarial network designed to improve GAN-based neural vocoders.
It enhances model stability and audio quality through innovative architectures and targeted training strategies.
Experimental results demonstrate significant improvements in synthesis naturalness and a reduction in audio artifacts.

Analysis of ICASSP 2021 Author Guidelines

In academic conferences, uniformity and adherence to specific formatting requirements are critical for the processing and presentation of research papers. The document titled "Author Guidelines for ICASSP 2021 Proceedings Manuscripts" presents a structured and detailed framework for authors preparing submissions for the conference proceedings. This guideline addresses various technical aspects necessary for consistent documentation, crucial for enhancing readability, accessibility, and archival quality of academic contributions in the domain of signal processing.

The document opens with guidelines on document typography and layout specifications. It stresses the importance of using uniform fonts, specifically recommending Times-Roman for its superior renderability in both printed and electronic mediums. Consistency in font usage aids the reader's experience, ensuring that all textual representations maintain readability across diverse presentation formats.

Formatting standards, including paper size, margins, and column settings, are meticulously detailed in the document. The prescribed two-column format with specific width and spacing is a common standard in technical documentation, maximizing the use of available space while maintaining readability. This structured layout facilitates the rapid assimilation of visual and textual information.

The guideline further specifies best practices for incorporating figures and illustrations. It emphasizes the positional hierarchy for placing visual elements, recommending top-of-column placements to maintain flow and context clarity. The use of clear black and white palettes ensures that figures retain their informative value even when subjected to non-color printing processes, a consideration crucial for reproducible and accurate dissemination of results.

Additionally, the document offers guidance on referencing and citations, highlighting the necessity of contextualizing current work within the framework of existing literature. This practice not only acknowledges foundational research but also positions new contributions in the ongoing scholarly discourse, allowing peers to evaluate advancements with a comprehensive perspective.

Practical implications of this paper are evident in the standardization process it promotes, which is crucial for seamless integration of single contributions into larger compilations, such as conference proceedings. This ensures uniformity across published works, facilitating easier navigation and accessibility for the academic community.

Theoretical implications extend towards reinforcing the discipline in scientific writing and presentation. This document acts as a pedagogical tool for authors, novice, and seasoned alike, highlighting the importance of meticulous attention to detail in academic publication processes.

Looking forward, these guidelines reflect a broader trend towards automating and simplifying submission processes. It could be anticipated that future developments might incorporate more adaptive templates and automated compliance checks to further ease authors' burdens and enhance the efficiency of the publication pipeline.

In conclusion, the "Author Guidelines for ICASSP 2021 Proceedings Manuscripts" offers a definitive guide to producing scientifically rigorous and visually uniform documentation. This contributes significantly to the collective endeavor of advancing knowledge dissemination practices within the signal processing community.

PDF Markdown

Related Papers

GitHub

GitHub - sony/bigvsan_eval: Evaluation tool used in the BigVSAN paper (11 stars)
GitHub - sony/bigvsan: Pytorch implementation of BigVSAN (202 stars)

Tweets

https://twitter.com/takiko_san/status/1747443029525557416

https://twitter.com/zaptrem/status/1811861368196534281

https://twitter.com/AudioAndSpeech/status/1772602704499515497

https://twitter.com/knishimae0531/status/1772830391952658560