Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio Caption in a Car Setting with a Sentence-Level Loss (1905.13448v2)

Published 31 May 2019 in cs.SD, cs.CL, and eess.AS

Abstract: Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning. This paper contributes a Mandarin-annotated dataset for audio captioning within a car scene. A sentence-level loss is proposed to be used in tandem with a GRU encoder-decoder model to generate captions with higher semantic similarity to human annotations. We evaluate the model on the newly-proposed Car dataset, a previously published Mandarin Hospital dataset and the Joint dataset, indicating its generalization capability across different scenes. An improvement in all metrics can be observed, including classical natural language generation (NLG) metrics, sentence richness and human evaluation ratings. However, though detailed audio captions can now be automatically generated, human annotations still outperform model captions on many aspects.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xuenan Xu (29 papers)
  2. Heinrich Dinkel (29 papers)
  3. Mengyue Wu (57 papers)
  4. Kai Yu (202 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.