Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Visual Styles from Audio-Visual Associations (2205.05072v1)

Published 10 May 2022 in cs.CV, cs.MM, cs.SD, and eess.AS

Abstract: From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn to modify input images such that, after manipulation, they are more likely to co-occur with a given input sound. In quantitative and qualitative evaluations, our sound-based model outperforms label-based approaches. We also show that audio can be an intuitive representation for manipulating images, as adjusting a sound's volume or mixing two sounds together results in predictable changes to visual style. Project webpage: https://tinglok.netlify.app/files/avstyle

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tingle Li (14 papers)
  2. Yichen Liu (54 papers)
  3. Andrew Owens (52 papers)
  4. Hang Zhao (156 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.