PAL-Set: Perceptual Artifact Localization Benchmark

Updated 26 February 2026

PAL-Set is a comprehensive dataset featuring 10,168 generated images with precise, per-pixel binary masks to benchmark artifact localization.
The dataset employs a rigorous annotation protocol with high inter-annotator agreement (κ≈0.82) across diverse generative tasks and models.
It supports various applications including automatic inpainting, image quality assessment, and cross-model artifact detection for generative systems.

PAL-Set refers to multiple large-scale, rigorously annotated datasets in machine learning, each serving as a benchmark in distinct domains: (1) Perceptual artifact localization in image synthesis, (2) Memory-based personalization for dialogue systems, and (3) Robotics pallet detection via 2D rangefinder scans. The most widely referenced instance is the “Perceptual Artifacts Localization Set” (Zhang et al., 2023), designed for fine-grained, per-pixel localization of image synthesis artifacts across generative models and tasks. Separate unrelated datasets using the PAL-Set acronym have also appeared in robotics (Mohamed et al., 2018) and dialogue personalization (Huang et al., 17 Nov 2025). This entry primarily concerns the image synthesis corpus, but contrasts and cross-references all three for unambiguous identification.

1. Conceptual Scope and Purpose

PAL-Set (“Perceptual Artifacts Localization Set”) (Zhang et al., 2023) is constructed to address the need for quantitative, per-region evaluation of image synthesis artifacts. It comprises 10,168 generated images paired with dense, pixel-level binary masks demarcating all “perceptual artifacts”—regions that are visually implausible, unpleasant, or inconsistent with correct real-world content, as identified by expert human raters. The dataset spans ten diverse synthesis tasks, encapsulating both unconditional and complex conditional generation. Its primary aim is to standardize artifact localization training and benchmarking, enabling robust segmentation model development, cross-task analysis, and research into artifact remediation.

2. Composition, Acquisition, and Annotation

Image Sourcing and Task Breakdown

PAL-Set comprises images synthesized by mainstream generative methods, including but not limited to: StyleGAN2, Latent Diffusion Model (LDM), Anyres GAN, Real-ESRGAN, PITI (for inpainting, edge-to-image, and mask-to-image), latent space composition, C-VTON (virtual try-on), and portrait shadow removal pipelines. Each task contributes approximately 1,016 images, for a total of 10,168 examples.

Task (Model)	Images	Type
StyleGAN2	1,020	Unconditional
LDM (LSUN)	1,005	Unconditional
Anyres GAN	1,010	Unconditional
Real-ESRGAN	1,020	Super-Resolution
PITI (Inpaint)	1,018	Inpainting
PITI (E2I)	1,022	Edge-to-Image
PITI (M2I)	1,023	Mask-to-Image
Latent Comp.	1,017	Composition
C-VTON	1,008	Virtual Try-on
Portrait Shadow	1,025	Shadow Removal

Generated images are kept at their native resolutions (either 512×512 or 1024×1024), with each sample accompanied by a precisely registered binary annotation mask.

Annotation Protocols and Quality Control

A “perceptual artifact” is operationalized as any region judged either (a) implausible or visually unpleasant, or (b) readily correctible by an ideal inpainting oracle. Expert annotators employ free-form mask painting tools to segment all such regions, erring on the side of generous coverage. Each pixel is given a binary label (artifact vs. non-artifact).

Quality assurance includes double annotation on a random subset (~500 images). Inter-annotator agreement is computed using Cohen’s κ, yielding κ ≈ 0.82, indicating substantial annotation consistency. A plausible implication is that the dataset reliability for segmentation evaluation is high.

Parallel datasets, notably the “PAL-Set” for personalized dialogue (Huang et al., 17 Nov 2025) and 2D pallet scanning (Mohamed et al., 2018), involve distinct generation and annotation workflows (LLM-driven synthesis or manual scan labeling) and do not overlap with the scope or protocol of the image synthesis artifact dataset.

3. Dataset Format, Structure, and Access

PAL-Set is formatted as a hierarchical file system:

/PAL-Set/
    /train/
        /stylegan2/
            stylegan2_0001.png
            stylegan2_0001_mask.png
            ...
        /ldm/
        /anyres/
        ...
    /val/
    /test/
    metadata.json

Image–mask pairs: Each sample comprises an image (“<task><####>.png”) and a binary mask (“<task><####>_mask.png”).
Splits: Random 80%/10%/10% division per task into train, val, and test (approx. 8,134/1,017/1,017 samples respectively).
Metadata: Task-specific JSON files provide model checkpoints, random seeds, and generation parameters. All images are unprocessed prior to annotation.
Pixel labeling: Masks use the convention 0 (background), 255 (artifact), stored as PNG.

Synthetic dialogue (Huang et al., 17 Nov 2025) and LiDAR scan (Mohamed et al., 2018) PAL-Set instances adopt hierarchical directories with JSONL (logs/dialogues) and MAT/TXT (scans, images, ROIs) formats, respectively—detailed schemas ensure ease of programmatic access.

4. Statistical Characterization and Evaluation Benchmarks

PAL-Set supports both per-image and aggregate statistical analyses:

Perceptual Artifacts Ratio (PAR):

$\text{PAR} = \frac{\# \text{artifact pixels}}{\text{width} \times \text{height}}$

Mean PAR for tasks varies, e.g., Inpaint (PITI) 0.20±0.09, StyleGAN2 0.12±0.05, Edge-to-Image (PITI) 0.22±0.10.

Segmentation metrics: Given $P$ (predicted artifact pixels) and $G$ (ground truth),

$\text{IoU} = \frac{|P \cap G|}{|P \cup G|},\quad \text{Precision} = \frac{|P \cap G|}{|P|},\quad \text{Recall} = \frac{|P \cap G|}{|G|}$

Benchmark mean IoU (mIoU) for representative models on the test split:

| Model | StyleGAN2 | LDM | SR | Comp. | |-------------------|-----------|-------|-------|-------| | Patch-Forensics | 9.08% | 1.34% | 9.63% | 2.14% | | PAL4Inpaint | 0.98% | 0.81% |14.42% |15.94% | | Specialist (ours) |35.39% |14.41% |37.44% |25.31% | | Unified (ours) |30.86% |11.92% |38.07% |29.53% |

Specialist models (per-task training) further improve particular tasks relative to unified models.

Generalization trials on unseen models (e.g., StyleGAN3, BlobGAN, VersaDiffusion) report zero-shot mIoU in the 6–25% range, with fine-tuning on 10 images rapidly boosting performance to 20–35% mIoU.

The dialogue and LiDAR PAL-Set instances provide correlational and coverage statistics (dialogue turns/session, scan diversity), but not image segmentation metrics.

5. Algorithmic Applications and Benchmarks

Core use cases demonstrated with PAL-Set include:

Artifact Segmentation: Supervised training of deep segmentation architectures (Swin-T + UPerNet + FCN) for per-pixel artifact localization.
Image Restoration: Automated inpainting (LaMa, CoMod-GAN, DALL·E 2) after mask-based artifact localization. A “zoom-in” padding/inpainting/blending pipeline leverages PAL-Set masks. User studies show that retouched outputs are preferred in 6/10 generative tasks (p < 0.05).
No-reference Image Quality Assessment: The PAR serves as an interpretable scalar IQA score; user agreement with PAR ordering reaches 74.5% (StyleGAN2) and 63.9% (Stable Diffusion), exceeding SOTA blind IQA (58–61%).
Abnormal Region Detection: Artifact segmenters, trained on PAL-Set, flag rare distractors (e.g., watermarks) in real images with low false-positive rates.
Cross-Model Transfer: Models trained on PAL-Set demonstrate credible artifact localization on previously unseen architectures and image domains.

In dialogue (Huang et al., 17 Nov 2025), PAL-Set underpins memory-augmented benchmarking with BLEU/GPT-4 metrics, solution selection scores, and Win–Tie–Lose head-to-heads, supporting research on retrieval-augmented systems. In robotics (Mohamed et al., 2018), PAL-Set enables framewise classification, ROI localization, and online tracking.

6. Limitations and Prospective Extensions

PAL-Set’s image synthesis variant is restricted to binary “artifact” masks, without fine-grained artifact type classification. Annotator subjectivity—though mitigated by double-annotation and high κ—may influence mask boundaries, especially for ambiguous cases. A plausible implication is that training highly detailed artifact taxonomies or multi-class segmentation would require additional labeling efforts.

Future extensions could include multi-class or severity ranking masks, expansion to higher-resolution and video synthesis, inclusion of per-pixel confidence, and richer metadata for provenance and synthesis conditions.

Synthetic dialogue PAL-Set (Huang et al., 17 Nov 2025) is limited by lack of real-world user data and the finite diversity of LLM-synthesized personas, while robotics PAL-Set (Mohamed et al., 2018) includes only single-pallet 2D LiDAR scenes and omits raw reflectivity, 3D, or multi-modal cues.

7. Access, Licensing, and Interoperability

PAL-Set images and masks are made available in standard PNG and JSON formats, suitable for direct integration with PyTorch/TensorFlow data pipelines (Zhang et al., 2023). The data and associated code are publicly released; license terms are cited in the repository/LICENSE file, commonly CC BY 4.0. Complete citation information and download links are provided in the official repositories or associated papers.

Distinct PAL-Set datasets in dialogue (Huang et al., 17 Nov 2025) and robotics (Mohamed et al., 2018) are likewise released with open access and detailed format documentation, enabling further benchmarking across vision, interaction, and embodiment research contexts.

Markdown Report Issue Upgrade to Chat

References (3)

Perceptual Artifacts Localization for Image Synthesis Tasks (2023)

A 2D laser rangefinder scans dataset of standard EUR pallets (2018)

Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PAL-Set Dataset.

PAL-Set: Perceptual Artifact Localization Benchmark

1. Conceptual Scope and Purpose

2. Composition, Acquisition, and Annotation

Image Sourcing and Task Breakdown

Annotation Protocols and Quality Control

3. Dataset Format, Structure, and Access

4. Statistical Characterization and Evaluation Benchmarks

5. Algorithmic Applications and Benchmarks

6. Limitations and Prospective Extensions

7. Access, Licensing, and Interoperability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PAL-Set: Perceptual Artifact Localization Benchmark

1. Conceptual Scope and Purpose

2. Composition, Acquisition, and Annotation

Image Sourcing and Task Breakdown

Annotation Protocols and Quality Control

3. Dataset Format, Structure, and Access

4. Statistical Characterization and Evaluation Benchmarks

5. Algorithmic Applications and Benchmarks

6. Limitations and Prospective Extensions

7. Access, Licensing, and Interoperability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research