Dialogs Re-enacted Across Languages

Published 18 Nov 2022 in cs.CL, cs.SD, and eess.AS | (2211.11584v2)

Abstract: To support machine learning of cross-language prosodic mappings and other ways to improve speech-to-speech translation, we present a protocol for collecting closely matched pairs of utterances across languages, a description of the resulting data collection and its public release, and some observations and musings. This report is intended for: people using this corpus, people extending this corpus, and people designing similar collections of bilingual dialog data.