Papers
Topics
Authors
Recent
Search
2000 character limit reached

SC-VAE: Sparse Compression Variational Autoencoder

Updated 22 January 2026
  • SC-VAE is a model that merges sparse coding with variational autoencoder frameworks, using learned ISTA to enforce sparsity in latent representations.
  • It employs a fixed orthogonal dictionary and an L1 sparse prior to ensure robust reconstruction and control over feature manipulation for diverse data modalities.
  • Experimental results demonstrate superior performance in terms of reconstruction metrics and compression efficiency on image and point cloud tasks.

Sparse Compression Variational Autoencoder (SC-VAE) encompasses a family of models that integrate sparse data representations with variational autoencoder (VAE) frameworks. The principal SC-VAE variant leverages learned sparse coding, operationalized via trainable iterative shrinkage-thresholding algorithms, to produce interpretable latent structures and superior quantitative performance for a range of data modalities, including natural images and point clouds. Two representative instantiations are: (1) the Sparse Coding-based VAE with Learned ISTA for image tasks, and (2) the Sparse Tensor-based VAE with sparse convolutions for point cloud attribute compression.

1. Model Formulation and Generative Framework

The SC-VAE paradigm enforces sparsity in latent representations while maintaining compatibility with deep generative modeling. For image data (Xiao et al., 2023), the model is defined as follows:

  • Generative Model: Given sparse latent code ss, the decoder GθG_\theta reconstructs input xx with an isotropic Gaussian likelihood:

pθ(xs)=N(x;Gθ(Ds),σ2I)p_\theta(x \mid s) = \mathcal{N}\left(x ; G_\theta(D s), \sigma^2 I\right)

where DRn×KD \in \mathbb{R}^{n \times K} is a fixed orthogonal dictionary (typically DCT basis), and σ2=1\sigma^2=1.

  • Inference Model: The deterministic encoder EϕE_\phi produces feature z=Eϕ(x)z = E_\phi(x), then applies learned ISTA (LISTA) to estimate s^\hat s:

s^=LISTAϕ(z)\hat s = \mathrm{LISTA}_\phi(z)

yielding an approximate posterior qϕ(sx)δ(ss^)q_\phi(s|x) \approx \delta(s-\hat s).

  • Sparse Prior: An L1L_1 (Laplace) prior is imposed on codes:

p(s)exp(αs1)p(s) \propto \exp(-\alpha \|s\|_1)

For point cloud attribute compression (Wang et al., 2022), SC-VAE utilizes sparse tensors representing point attributes, with sparse convolutions forming the encoder and decoder.

2. Sparse Coding and Learnable ISTA Algorithms

Sparse coding aims to represent high-dimensional vectors as a sparse linear combination of dictionary atoms. In SC-VAE (Xiao et al., 2023):

  • Sparse Coding Objective (per feature vector zz):

E(z,s)=12zDs22+αs1\mathcal{E}(z, s) = \tfrac{1}{2} \|z - D s\|_2^2 + \alpha \|s\|_1

  • Learnable ISTA (LISTA): The sparse coding problem is solved by unrolling ISTA for TT iterations:

s(t+1)=softλt(s(t)+ηtD[zDs(t)])s^{(t+1)} = \operatorname{soft}_{\lambda_t}(s^{(t)} + \eta_t D^\top[z - D s^{(t)}])

with learnable parameters {ηt,λt}\{\eta_t, \lambda_t\}. The initial s(0)=0s^{(0)}=0.

Algorithm 1: Learnable ISTA (LISTA)

Input: zRnz \in \mathbb{R}^n, DRn×KD \in \mathbb{R}^{n \times K}, {ηt,λt}t=0T1\{\eta_t, \lambda_t\}_{t=0}^{T-1}

  1. Initialize s(0)0s^{(0)} \leftarrow 0
  2. For t=0t = 0 to T1T-1:
    • u(t)s(t)+ηtD[zDs(t)]u^{(t)} \leftarrow s^{(t)} + \eta_t D^\top [z - D s^{(t)}]
    • s(t+1)softλt(u(t))s^{(t+1)} \leftarrow \operatorname{soft}_{\lambda_t}(u^{(t)})
  3. Return s^=s(T)\hat s = s^{(T)}

3. Network Architectures and Computational Aspects

For image SC-VAE (Xiao et al., 2023):

  • Encoder/Decoder: Follows the VQGAN backbone: stacked ResidualConv, GroupNorm, Swish, Down/Up-sampling blocks; latent dimension n=256n=256.
  • Dictionary: Fixed DCT, K=512K=512 atoms, tied across layers.
  • Training: Adam optimizer, learning rate 10410^{-4}, batch size $16$, $10$ epochs.

For point cloud SC-VAE (Wang et al., 2022):

  • Encoder: Six sparse 3D convolutions, input $3$ channels (RGB) to $128$, final bottleneck per occupied voxel with $128$ features.
  • Decoder: Mirrors encoder with transposed convolutions.
  • Entropy Model: Hyper-encoder/decoder operating on sparse tensors, context model for Laplace parameter estimation.
  • Computational Complexity: Total 3.2\sim3.2M parameters, runtime/memory scales linearly with number of points.

4. Training Objectives and Variational Framework

SC-VAE for images (Xiao et al., 2023) uses a composite loss:

  • Pixel-level Reconstruction:

Lrec=xGθ(Ds^)22\mathcal{L}_{\text{rec}} = \|x - G_\theta(D \hat s)\|_2^2

  • Latent-space Sparse Coding (averaged over tokens):

Lsparse=1hwi,j[12zijDsij22+αsij1]\mathcal{L}_{\text{sparse}} = \frac{1}{h w} \sum_{i,j} [\tfrac{1}{2} \|z_{ij} - D s_{ij}\|_2^2 + \alpha \|s_{ij}\|_1]

  • Total Objective:

L(θ,ϕ)=Lrec+Lsparse\mathcal{L}(\theta, \phi) = \mathcal{L}_{\text{rec}} + \mathcal{L}_{\text{sparse}}

Approximates ELBO: logpθ(xs)xG(Ds)2-\log p_\theta(x|s) \approx \|x-G(Ds)\|^2, DKL[q(sx)p(s)]αs1D_{KL}[q(s|x) || p(s)] \approx \alpha \|s\|_1.

For point cloud SC-VAE (Wang et al., 2022):

  • ELBO: Eqϕ(zx)[logpθ(xz)]+KL(qϕ(zx)p(z))\operatorname{E}_{q_\phi(z|x)}[-\log p_\theta(x|z)] + KL(q_\phi(z|x) \| p(z))
  • Rate-Distortion: R+λDR + \lambda D, D=jxjx^j22D = \sum_j \|x_j - \hat{x}_j\|_2^2.

5. Experimental Evaluation

SC-VAE demonstrates strong empirical performance across tasks.

Model PSNR SSIM LPIPS rFID
SC-VAE (FFHQ) 34.92 0.9497 0.0080 4.21
SC-VAE (ImageNet) 38.40 0.9688 0.0070 0.71
  • Qualitative Results: SC-VAE preserves fine details (leaves, textures) better than VQGAN/RQ-VAE and generalizes to out-of-distribution inputs.
  • Attribute Manipulation: Varying sparse code components yields controlled changes in pose, lighting, style; smooth interpolation leads to coherent morphing.
  • Unsupervised Segmentation: K-means clustering on patch-level codes sijs_{ij} leads to accurate segmentation (IoU up to 81.2%).
  • Ablations (rollout steps TT):
TT PSNR Sparsity
1 27.3 89.5%
5 31.13 71.9%
16 31.41 74.9%
\geq25 drop denser
Baseline BD-BR Reduction BD-PSNR Gain (Y)
TMC13v6 24% +0.97 dB
RAHT 34% +1.38 dB

Qualitative results confirm fewer blocking artifacts and smooth reconstructions in the point cloud modality.

6. Mitigating Posterior and Codebook Collapse

SC-VAE addresses common VAE failures:

  • Posterior Collapse: The encoder produces features that must admit sparse reconstruction. The sparse penalty s1\|s\|_1 and multi-stage training preclude trivial decoder behavior.
  • Codebook Collapse (VQ-VAEs): Fixed orthogonal dictionary DD (e.g., DCT) ensures no dead atoms, and differentiable thresholding further mitigates collapse.

A plausible implication is that the SC-VAE provides a principled balancing point between continuous (dense Gaussian codes) and discrete VAEs (one-hot quantization), producing interpretable, well-behaved latent spaces.

7. Applicability and Significance

SC-VAE yields disentangled sparse codes amenable to downstream tasks such as:

  • Image Generation and Morphing: Direct, interpretable code manipulation.
  • Unsupervised Clustering and Segmentation: Patch-wise codes facilitate clustering (e.g., spectral clustering, k-means), outperforming prior sparse/quantized VAEs in IoU for medical and natural images.
  • Robustness: Graceful degradation under Gaussian noise, especially for small σ\sigma.
  • Compression: For point cloud attributes, SC-VAE outperforms G-PCC v6 and RAHT, offers competitive visual quality with G-PCC v14, and runs in real time on commodity hardware.

SC-VAE constitutes an end-to-end learned codec and generative representation framework, combining sparse coding principles with deep variational modeling, and operationalized via differentiable solvers such as LISTA for scalable, interpretable, and robust latent representations (Xiao et al., 2023, Wang et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sparse Compression Variational Autoencoder (SC-VAE).