Papers
Topics
Authors
Recent
Search
2000 character limit reached

TinyTTM: Compressed Models

Updated 17 April 2026
  • TinyTTM is a dual model family that includes a compressed text-to-music generator and a lightweight time-series forecasting model designed for resource-constrained scenarios.
  • In the text-to-music application, it reduces the parameter count from 557.6M to 89.2M using knowledge distillation and selective architectural minimization while maintaining competitive audio quality.
  • The methodology employs a composite loss combining cross-entropy, KL divergence, and MSE to effectively transfer knowledge from larger models to their compressed counterparts.

TinyTTM denotes two distinct but similarly named model families in recent machine learning research: (1) a highly compressed transformer-based text-to-music generation system (Moschopoulos et al., 2024), and (2) a lightweight universal time-series forecasting model known as Tiny Time Mixers (TTM) (Ekambaram et al., 2024). Both approaches are motivated by the need to maximize performance while minimizing model capacity and computational demands, with particular emphasis on resource-constrained or real-time deployment scenarios.

1. Compressed Text-to-Music Generation: TinyTTM

TinyTTM in the context of generative AI for music synthesis is presented as a comprehensive model compression study targeting transformer-based text-to-music (TTM) systems. The reference implementation focuses on compressing MusicGen-Small—one of the state-of-the-art transformer architectures for this modality—down from 557.6M parameters to 89.2M parameters, leveraging knowledge distillation and structural reduction, while maintaining competitive audio generation quality as measured by Fréchet Audio Distance (FAD) and Kullback–Leibler divergence (KL) on the MusicBench evaluation set. No explicit pruning, quantization, or low-rank factorization is applied beyond distillation and selective architectural minimization (Moschopoulos et al., 2024).

Architecture Comparison

Component MusicGen-Small TinyTTM (V2) Parameter Count (TinyTTM)
Encoder T5-base (109.6M params) T5-tiny (4L, fine-tuned) 11.3M
Generative Model 24-layer AR Transformer (1024d) 7L Transformer (720d) 70.5M
Decoder EnCodec (4×conv + 2×LSTM + conv) Distilled EnCodec 7.43M
Total 557.6M 89.2M -

The TinyTTM model pipeline consists of a T5-tiny encoder, a 7-layer, 8-head transformer LLM (LM) for latent sequence modeling, and a distilled version of the EnCodec neural codec for audio waveform reconstruction. The encoder is fine-tuned on MusicBench using span-based masked language modeling and cross-entropy. Distillation from the fine-tuned MusicGen-Small is performed using cross-entropy, KL, and intermediate MSE losses with dynamic loss-weight scheduling.

2. Knowledge Distillation and Compression Methodology

The TTM LM student is trained using a composite distillation loss:

LLM=sLS+tLT+mLMSEL_{\mathrm{LM}} = \ell_{s} L_{S} + \ell_{t} L_{T} + \ell_{m} L_{\mathrm{MSE}}

where LSL_S is standard cross-entropy with ground truth targets, LTL_T is a softened cross-entropy (KL divergence) with the teacher's outputs, and LMSEL_{\mathrm{MSE}} is an intermediate MSE computed between selected teacher and student hidden

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TinyTTM.