PersonSyn: Synthetic Data for Controllable ReID
- PersonSyn is a large-scale dataset engineered for multi-reference, controllable, identity-preserving person generation across RGB and IR modalities.
- It aggregates data from multiple established ReID benchmarks with over 3,740 unique identities, offering dense annotations for precise pose, viewpoint, and attribute control.
- The automated curation pipeline and standardized benchmark protocols enable robust evaluation and effective augmentation of real-world ReID datasets.
PersonSyn is a large-scale dataset specifically constructed for multi-reference, controllable, identity-preserving person generation, targeting research in person re-identification (ReID) for both visible and infrared (RGB/IR) modalities. Developed within the OmniPerson framework, PersonSyn addresses key limitations in traditional ReID dataset augmentation: insufficient identity consistency and limited controllability. By transforming public, ID-labelled ReID benchmarks into a unified, richly annotated resource, PersonSyn introduces automated, dense supervision for each pedestrian, supporting fine-grained control over pedestrian attributes, pose, viewpoint, and modality (Ma et al., 2 Dec 2025).
1. Dataset Composition and Scale
PersonSyn draws its samples from multiple established ReID datasets, aggregating approximately 3,740 unique identities across image and video data sources:
| Source Dataset | Modalities | Unique IDs | Images (RGB+IR) | Video Frames |
|---|---|---|---|---|
| Market-1501 | RGB | 751 | 12,936 | N/A |
| MSMT17 | RGB | 1041 | 32,621 | N/A |
| SYSU-MM01 | RGB/IR | 395 | 22,258 RGB<br\>11,140 IR | N/A |
| Occluded-ReID | RGB | 702 | 1,000 (whole-body)<br\>969 (occluded) | N/A |
| MARS | RGB (video) | 600 | N/A | 125,000 |
| HITSZ-VCM | RGB/IR (video) | ~350 | N/A | 75,000 |
In total, PersonSyn comprises ≈80,000 annotated still frames and ≈200,000 video frames, including ≈10,000 IR stills and ≈55,000 IR video frames. This distribution enables training and evaluation under diverse lighting, sensor, and environmental conditions (Ma et al., 2 Dec 2025).
2. Multi-Reference and Controllable Annotation Scheme
Central to PersonSyn is its multi-reference paradigm: each target image or frame is paired with a flexible set of same-ID references, differentiated by:
- Camera ID
- Estimated orientation (front, back, left, right)
- Semantic attributes (e.g., "carrying backpack", "wearing hat")
Reference–target relationships are quantitatively characterized using:
- CLIP-ReID similarity:
- Viewpoint score:
This enables sampling of N≥1 references per target for controlled pose and appearance variation, facilitating experiments with extreme viewpoint and attribute diversity.
3. Dense Supervisory Signals and Attribute Control
Each PersonSyn entry contains comprehensive modal and semantic annotation, supporting granular control over generation conditions:
- 2D skeleton keypoints (OpenPose)
- 3D SMPL-X parameters (for body mesh and orientation), computed as
- Camera-agnostic orientation labels and unit vectors
- Background "plates" (subject-masked images)
- Attribute tags (from Qwen-VL): garment color/type, presence of accessories, pose (cycling, walking), etc.
- Text prompts assembled to describe the target sample, supporting conditional generation
During generation with OmniPerson, users can control pose, background, modality (e.g., RGB↔IR translation), resolution (up to 512×512), viewpoint sampling, and free-form attribute prompts.
4. Automated Curation Pipeline
The five-stage curation pipeline ensures data reliability and annotation density:
- Condition Extraction: OpenPose for 2D keypoints, SMPLest-X for 3D mesh/orientation, Qwen-VL for semantic attribute extraction.
- Data Cleaning: Removal of images with low pose keypoint confidence or large 2D/3D misalignment.
- Classification & Integration: Grouping within-ID samples by (camera, orientation, attributes), precomputing CLIP-ReID similarity and viewpoint scores, and outputting JSON metadata per target. This design enables direct sampling and reference control for experiments requiring specific anatomical or environmental cues.
5. Statistical Distributions and Dataset Structure
Image-based statistics from the main training datasets reveal:
- Orientation: ≈44% front, ≈30% back, ≈15% left, ≈11% right views
- Modalities:
- Camera distribution in Market-1501: 14%, 12%, 21%, 6%, 14%, 18%
- In video, pose gap between consecutive frames is approximately uniform in [10°, 50°], enhancing motion diversity
The dataset is organized with a directory structure comprising annotations (JSONs for images and videos, splits), images (organized by modality), pose and mesh visualizations, backgrounds, reference collections, and video frames. Each annotation JSON entry includes all conditional and reference metadata, enabling efficient data loading and experiment configuration.
6. Benchmark Protocols and Empirical Impact
PersonSyn conforms to source dataset split protocols and provides an additional 10% “held-out IDs” validation set. Recommended usage includes:
- Augmenting real ReID training data either with stills (1–4 poses per ID) or entire video sequences, initially at a 1:1 real:synthetic ratio
- Evaluation metrics: mAP, Rank-1/CMC@k (ReID), LPIPS/SSIM/PSNR (image quality), FVD (video quality), and ReID identity consistency
- Best practices: maximize cross-viewpoint reference coverage, utilize classifier-free guidance dropout for controllability, vary temporal sampling in sequences
Notable improvements are reported: augmenting Market-1501 with four OmniPerson poses per ID increases TransReID mAP from 87.3 to 88.7 (+1.4) and Rank-1 from 94.3 to 94.7. On SYSU-MM01 using PMT (all-search), R-1 rises from 67.5 to 68.3 and mAP from 64.9 to 66.7. Under CLIP-ReID, Market mAP increases from 89.8 to 90.4 (Ma et al., 2 Dec 2025).
7. Access, Licensing, and Use Cases
PersonSyn is released under a CC-BY-NC-SA 4.0 license (non-commercial, share-alike) and is available for download in images and annotations at https://github.com/maxiaoxsi/OmniPerson, alongside code and pretrained models. Recommended use cases include:
- Data augmentation for ReID and pedestrian generation
- Multi-modal (RGB/IR) and cross-viewperson synthesis
- Training and benchmarking identity-preserving generative models
- Fine-grained studies of attribute controllability and pose variation
PersonSyn thus establishes a new, scalable standard for controlled, multi-reference, identity-centric synthetic pedestrian data, providing comprehensive annotation and high empirical utility for the ReID and conditional person generation communities (Ma et al., 2 Dec 2025).