Papers
Topics
Authors
Recent
2000 character limit reached

PersonSyn: Synthetic Data for Controllable ReID

Updated 9 December 2025
  • PersonSyn is a large-scale dataset engineered for multi-reference, controllable, identity-preserving person generation across RGB and IR modalities.
  • It aggregates data from multiple established ReID benchmarks with over 3,740 unique identities, offering dense annotations for precise pose, viewpoint, and attribute control.
  • The automated curation pipeline and standardized benchmark protocols enable robust evaluation and effective augmentation of real-world ReID datasets.

PersonSyn is a large-scale dataset specifically constructed for multi-reference, controllable, identity-preserving person generation, targeting research in person re-identification (ReID) for both visible and infrared (RGB/IR) modalities. Developed within the OmniPerson framework, PersonSyn addresses key limitations in traditional ReID dataset augmentation: insufficient identity consistency and limited controllability. By transforming public, ID-labelled ReID benchmarks into a unified, richly annotated resource, PersonSyn introduces automated, dense supervision for each pedestrian, supporting fine-grained control over pedestrian attributes, pose, viewpoint, and modality (Ma et al., 2 Dec 2025).

1. Dataset Composition and Scale

PersonSyn draws its samples from multiple established ReID datasets, aggregating approximately 3,740 unique identities across image and video data sources:

Source Dataset Modalities Unique IDs Images (RGB+IR) Video Frames
Market-1501 RGB 751 12,936 N/A
MSMT17 RGB 1041 32,621 N/A
SYSU-MM01 RGB/IR 395 22,258 RGB<br\>11,140 IR N/A
Occluded-ReID RGB 702 1,000 (whole-body)<br\>969 (occluded) N/A
MARS RGB (video) 600 N/A 125,000
HITSZ-VCM RGB/IR (video) ~350 N/A 75,000

In total, PersonSyn comprises ≈80,000 annotated still frames and ≈200,000 video frames, including ≈10,000 IR stills and ≈55,000 IR video frames. This distribution enables training and evaluation under diverse lighting, sensor, and environmental conditions (Ma et al., 2 Dec 2025).

2. Multi-Reference and Controllable Annotation Scheme

Central to PersonSyn is its multi-reference paradigm: each target image or frame is paired with a flexible set of same-ID references, differentiated by:

  • Camera ID
  • Estimated orientation (front, back, left, right)
  • Semantic attributes (e.g., "carrying backpack", "wearing hat")

Reference–target relationships are quantitatively characterized using:

  • CLIP-ReID similarity: Sij=fi,fjfifjS_{ij} = \frac{\langle f_i, f_j \rangle}{\|f_i\| \|f_j\|}
  • Viewpoint score: sijori=vivjs^{\mathrm{ori}}_{ij} = \mathbf{v}_i \cdot \mathbf{v}_j

This enables sampling of N≥1 references per target for controlled pose and appearance variation, facilitating experiments with extreme viewpoint and attribute diversity.

3. Dense Supervisory Signals and Attribute Control

Each PersonSyn entry contains comprehensive modal and semantic annotation, supporting granular control over generation conditions:

  • 2D skeleton keypoints (OpenPose)
  • 3D SMPL-X parameters (for body mesh and orientation), computed as vori=R(θ,n)[0,0,1]\mathbf{v}_{\rm ori} = R(\theta, n)[0,0,1]^\top
  • Camera-agnostic orientation labels and unit vectors
  • Background "plates" (subject-masked images)
  • Attribute tags (from Qwen-VL): garment color/type, presence of accessories, pose (cycling, walking), etc.
  • Text prompts assembled to describe the target sample, supporting conditional generation

During generation with OmniPerson, users can control pose, background, modality (e.g., RGB↔IR translation), resolution (up to 512×512), viewpoint sampling, and free-form attribute prompts.

4. Automated Curation Pipeline

The five-stage curation pipeline ensures data reliability and annotation density:

  1. Condition Extraction: OpenPose for 2D keypoints, SMPLest-X for 3D mesh/orientation, Qwen-VL for semantic attribute extraction.
  2. Data Cleaning: Removal of images with low pose keypoint confidence or large 2D/3D misalignment.
  3. Classification & Integration: Grouping within-ID samples by (camera, orientation, attributes), precomputing CLIP-ReID similarity and viewpoint scores, and outputting JSON metadata per target. This design enables direct sampling and reference control for experiments requiring specific anatomical or environmental cues.

5. Statistical Distributions and Dataset Structure

Image-based statistics from the main training datasets reveal:

  • Orientation: ≈44% front, ≈30% back, ≈15% left, ≈11% right views
  • Modalities: p(RGB)0.88, p(IR)0.12p(\mathrm{RGB}) \approx 0.88, \ p(\mathrm{IR}) \approx 0.12
  • Camera distribution in Market-1501: 14%, 12%, 21%, 6%, 14%, 18%
  • In video, pose gap between consecutive frames is approximately uniform in [10°, 50°], enhancing motion diversity

The dataset is organized with a directory structure comprising annotations (JSONs for images and videos, splits), images (organized by modality), pose and mesh visualizations, backgrounds, reference collections, and video frames. Each annotation JSON entry includes all conditional and reference metadata, enabling efficient data loading and experiment configuration.

6. Benchmark Protocols and Empirical Impact

PersonSyn conforms to source dataset split protocols and provides an additional 10% “held-out IDs” validation set. Recommended usage includes:

  • Augmenting real ReID training data either with stills (1–4 poses per ID) or entire video sequences, initially at a 1:1 real:synthetic ratio
  • Evaluation metrics: mAP, Rank-1/CMC@k (ReID), LPIPS/SSIM/PSNR (image quality), FVD (video quality), and ReID identity consistency SReID=cos(fgen,fgt)S_{\mathrm{ReID}} = \cos(f_{\mathrm{gen}}, f_{\mathrm{gt}})
  • Best practices: maximize cross-viewpoint reference coverage, utilize classifier-free guidance dropout for controllability, vary temporal sampling in sequences

Notable improvements are reported: augmenting Market-1501 with four OmniPerson poses per ID increases TransReID mAP from 87.3 to 88.7 (+1.4) and Rank-1 from 94.3 to 94.7. On SYSU-MM01 using PMT (all-search), R-1 rises from 67.5 to 68.3 and mAP from 64.9 to 66.7. Under CLIP-ReID, Market mAP increases from 89.8 to 90.4 (Ma et al., 2 Dec 2025).

7. Access, Licensing, and Use Cases

PersonSyn is released under a CC-BY-NC-SA 4.0 license (non-commercial, share-alike) and is available for download in images and annotations at https://github.com/maxiaoxsi/OmniPerson, alongside code and pretrained models. Recommended use cases include:

  • Data augmentation for ReID and pedestrian generation
  • Multi-modal (RGB/IR) and cross-viewperson synthesis
  • Training and benchmarking identity-preserving generative models
  • Fine-grained studies of attribute controllability and pose variation

PersonSyn thus establishes a new, scalable standard for controlled, multi-reference, identity-centric synthetic pedestrian data, providing comprehensive annotation and high empirical utility for the ReID and conditional person generation communities (Ma et al., 2 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to PersonSyn Dataset.