Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CAT: Causal Audio Transformer for Audio Classification (2303.07626v1)

Published 14 Mar 2023 in cs.SD, cs.MM, and eess.AS

Abstract: The attention-based Transformers have been increasingly applied to audio classification because of their global receptive field and ability to handle long-term dependency. However, the existing frameworks which are mainly extended from the Vision Transformers are not perfectly compatible with audio signals. In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resolution Multi-Feature (MRMF) feature extraction with an acoustic attention block for more optimized audio modeling. In addition, we propose a causal module that alleviates over-fitting, helps with knowledge transfer, and improves interpretability. CAT obtains higher or comparable state-of-the-art classification performance on ESC50, AudioSet and UrbanSound8K datasets, and can be easily generalized to other Transformer-based models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiaoyu Liu (138 papers)
  2. Hanlin Lu (8 papers)
  3. Jianbo Yuan (33 papers)
  4. Xinyu Li (136 papers)
Citations (19)

Summary

We haven't generated a summary for this paper yet.