Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploration into Translation-Equivariant Image Quantization (2112.00384v3)

Published 1 Dec 2021 in cs.CV, cs.CL, and cs.LG

Abstract: This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing. Instead of focusing on anti-aliasing, we propose a simple yet effective way to achieve translation-equivariant image quantization by enforcing orthogonality among the codebook embeddings. To explore the advantages of translation-equivariant image quantization, we conduct three proof-of-concept experiments with a carefully controlled dataset: (1) text-to-image generation, where the quantized image indices are the target to predict, (2) image-to-text generation, where the quantized image indices are given as a condition, (3) using a smaller training set to analyze sample efficiency. From the strictly controlled experiments, we empirically verify that the translation-equivariant image quantizer improves not only sample efficiency but also the accuracy over VQGAN up to +11.9% in text-to-image generation and +3.9% in image-to-text generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Woncheol Shin (5 papers)
  2. Gyubok Lee (12 papers)
  3. Jiyoung Lee (42 papers)
  4. Eunyi Lyou (3 papers)
  5. Joonseok Lee (39 papers)
  6. Edward Choi (90 papers)
Citations (5)