Papers
Topics
Authors
Recent
Search
2000 character limit reached

AlphaLayers RGBA Dataset

Updated 25 February 2026
  • AlphaLayers is a multi-layer RGBA image dataset featuring 1,000 triplets with foreground, background, and composite images along with detailed pixel-level masks.
  • The dataset is created using a rigorous synthesis and filtering pipeline that employs Qwen3-VL and ObjectClear to ensure high consistency and quality.
  • It supports diverse tasks such as text-to-image generation, image matting, object removal, and layer decomposition, setting new performance benchmarks.

AlphaLayers is a multi-layer RGBA image dataset specifically constructed to support the development and evaluation of unified, multi-task RGBA generative models. Designed for both image synthesis and editing tasks that require explicit manipulation of layer structure—including matting, inpainting, object removal, decomposition, and compositional generation—AlphaLayers assembles high-quality triplets (foreground, background, and composite) with detailed pixel-level masks and aligned textual descriptions. Its rigorous synthesis and filtering pipeline yields a clean, consistent benchmark for the training of sequence-to-sequence diffusion frameworks such as OmniAlpha, and directly addresses the limitations of conventional RGB datasets for layered, transparency-aware research (Yu et al., 25 Nov 2025).

1. Dataset Composition and Structure

AlphaLayers contains 1,000 triplets, each composed of three tightly aligned RGBA images at 512×512512\times512 resolution:

  • Foreground (IfgI_{fg}): An object (often with semi-transparent boundaries) and its continuous alpha matte (αfg[0,1]H×W\alpha_{fg} \in [0,1]^{H\times W}).
  • Background (IbgI_{bg}): A scene with the foreground object removed, alpha channel everywhere 1.
  • Composite (IcompI_{comp}): The standard alpha compositing Icomp(x)=αfg(x)Ifg(x)+[1αfg(x)]Ibg(x)I_{comp}(x) = \alpha_{fg}(x) I_{fg}(x) + [1-\alpha_{fg}(x)] I_{bg}(x).

Accompanying these images are:

  • Four pixel-level masks: a binary precise mask (αfg>0.95\alpha_{fg} > 0.95), a three-level trimap, a rough mask, and the full continuous alpha map.
  • Three aligned captions: describing object, scene, and structured editing instruction.

All data are encoded as 4-channel PNGs.

2. Synthesis and Filtering Pipeline

AlphaLayers is created via an automated, multi-stage pipeline:

  1. Raw RGBA foreground acquisition: Source 10,000\sim10,000 single-layer RGBA samples from established matting benchmarks (Adobe, AM-2K, Distinctions-646, etc.).
  2. Foreground captioning: Use Qwen3-VL to generate a concise text description, TfgT_{fg}.
  3. Scenario generation: Prompt Qwen3-VL with (Ifg,Tfg)(I_{fg},T_{fg}) to obtain a composite scene caption, TcompT_{comp}.
  4. Composite synthesis: Use Qwen-Image-Edit for “background replacement,” compositing IfgI_{fg} over a generated background to obtain IcompI_{comp}.
  5. Background recovery: Use ObjectClear to inpaint the object away in IcompI_{comp}, yielding IbgI_{bg}; also caption IbgI_{bg} (TbgT_{bg}).
  6. Mask derivation: From αfg\alpha_{fg}, produce precise, trimap, and rough masks.

Triplets are rigorously filtered by a consistency score

S=λMSEfgcomp+(1λ)MSErecompcomp,λ=0.6S = \lambda\,\mathrm{MSE}_{fg\rightarrow comp} + (1{-}\lambda) \mathrm{MSE}_{recomp\rightarrow comp}, \quad \lambda=0.6

where MSEfgcomp=ExΩfgIfg(x)Icomp(x)αfg(x)2\mathrm{MSE}_{fg\rightarrow comp} = \mathbb{E}_{x\in\Omega_{fg}} \|I_{fg}(x)-I_{comp}(x)\alpha_{fg}(x)\|^2 and MSErecompcomp=ExIrecomp(x)Icomp(x)2\mathrm{MSE}_{recomp\rightarrow comp} = \mathbb{E}_x \|I_{recomp}(x)-I_{comp}(x)\|^2, with Irecomp=αfgIfg+(1αfg)IbgI_{recomp} = \alpha_{fg}I_{fg} + (1-\alpha_{fg})I_{bg}. Only the top 1,000 triplets by SS are retained (Yu et al., 25 Nov 2025).

3. Annotation Protocol and Layer Metadata

Each AlphaLayers instance includes:

  • Alpha and segmentation masks: Continuous alpha, binary "precise" (α>0.95\alpha>0.95), three-class trimap, and rough mask.
  • Textual descriptions: Object-only (T_fg), composite scene (T_comp), structured editing (T_replace), and background (T_bg).
  • Storage and Format: All images resized to 512×512512\times512, saved in four-channel PNG, with captions and masks bundled per triplet.

All backgrounds are independently synthesized, and object boundaries are sourced directly from high-quality matting masks.

4. Dataset Statistics and Scope

AlphaLayers covers:

  • 1,000 triplets total, with a split of 900 for training and 100 for held-out testing (“AlphaLayersTest”).
  • Broad domain diversity: portraits, objects, transparent materials, animals, and synthetic composites.
  • Backgrounds and scenes generated using LLM-based visual prompting (Qwen3-VL) and object inpainting (ObjectClear).
  • Captions ranging from concise (object name) to long, descriptive scene prompts ("rich 40–50 word scene prompts" in the original data).

All masks are derived from ground-truth matting, so mask-quality artifacts are minimal. A limitation is that all examples are single-object foregrounds; multi-object, occlusion-rich scenes are not included.

5. Supported Tasks and Benchmark Protocols

AlphaLayers is expressly structured for multi-task, multi-modal RGBA model development:

  • Task Categories (21 total):

1. Text-to-Image Generation. 2. Layer-Conditioned Completion (FG→BG, FG→Comp, BG→FG, BG→Comp). 3. Image Matting (mask-free, alpha/trimap/precise/rough/text-conditioned). 4. Object Removal (foreground extraction, background inpainting). 5. Layer Decomposition (recover both FG and BG from composite).

  • Benchmark splits: 900/100 (train/test); OOD generalization on AIM-500, RefMatte-RW100, and RORD.
  • Metrics: FID, CLIP-Score, pairwise win rates, SAD, MSE, GRAD, CONN, LPIPS, and PSNR (see Table 2–6 in (Yu et al., 25 Nov 2025)).
  • Evaluation protocol: Results are compared to strong baselines (LayerDiffuse, AlphaVAE, MAM, MatAny, TeachDiffusionMatting, LayerDecomp) on all primary and OOD benchmarks.

6. Applications and Limitations

AlphaLayers supports:

  • Training and benchmarking of unified diffusion models for RGBA text-to-image, matting, inpainting, and compositional decomposition.
  • Pretraining/fine-tuning on end-to-end transparency and layer-aware editing.
  • Unambiguous multi-modal (caption/mask/trimap) conditioning for model input.

Limitations:

  • Dataset scale—1,000 triplets—is markedly smaller than RGB-focused datasets and limits fine-grained compositional learning.
  • All samples are single-object; scenes with multiple overlapping or interacting transparency layers are not present.
  • Synthetic backgrounds reflect Qwen3-VL and ObjectClear priors; biases may result from distributional artifacts.
  • Fixed 512×512512 \times 512 resolution and a single pipeline for caption style and compositing.

7. Impact and Availability

AlphaLayers provides the foundation for OmniAlpha, a unified, multi-task RGBA sequence-to-sequence framework that achieves state-of-the-art performance, e.g., an 84.8% reduction in SAD for mask-free matting (AIM-500) and >90% human preference win rate on layer-conditioned completion tasks compared to the previous best methods. The dataset is publicly released for research purposes, with licensing terms matched to the upstream matting datasets (Yu et al., 25 Nov 2025). Its unified structure and strict curation protocol make it the current standard for training and benchmarking next-generation RGBA-aware generative models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AlphaLayers Dataset.