POLARIS-53K: Polarimetric & Multimodal Datasets
- POLARIS-53K is a suite of three independent, large-scale polarimetric datasets tailored for automotive, exoplanetary, and maritime perception tasks.
- Each dataset pioneers novel sensor modalities—integrating synchronized RGB-polarimetric, high-contrast astronomical, and multimodal maritime data—to boost machine learning performance.
- The datasets provide comprehensive calibration, benchmark splits, and state-of-the-art baseline results that drive advancements in representation learning and perception algorithms.
POLARIS-53K refers to three distinct, independently developed datasets in large-scale polarimetric imaging and perception: (1) an automotive driving benchmark providing synchronized RGB-polarimetric, lidar, GNSS/INS, and segmentation data (Baltaxe et al., 2023); (2) a high-contrast, polarimetric imaging collection for exoplanetary disk representation learning (Cao et al., 4 Jun 2025); and (3) a large-scale, multimodal maritime object detection and tracking dataset (Choi et al., 2024). Each is notable for pioneering new data modalities or scales within its domain and is closely tied to state-of-the-art machine learning for perception and representation.
1. Automotive Perception: POLARIS-53K RGB-Polarimetric Driving Dataset
The POLARIS-53K automotive dataset is the first large-scale, time-synchronized, and spatially calibrated collection to integrate RGB-polarimetric imagery, dense lidar, high-rate GNSS/IMU navigation, and pixel-level free-space segmentation masks for outdoor driving scenarios (Baltaxe et al., 2023). Designed to enable perception algorithms leveraging light polarization, it contains the following:
| Modality | Quantities/Resolution | Format / Sensor |
|---|---|---|
| RGB-Polarimetric | 12,627 frames / 1.25MP (1624×1224) | Lucid Vision TRI050S-QC |
| Lidar | 12,627 scans / 128 channels | Velodyne Alpha Prime |
| GNSS/IMU | Continuous @ 100Hz (pose, velocity) | OxTS RT3000, .csv/rosbag |
| Free-space labels | 8,141 masks / pixel-level | PNG (1=drivable, 0=non-drivable) |
Polarimetric channels are captured in four orientations (0°, 45°, 90°, 135°), providing per-pixel calculation of Stokes parameters and derived quantities:
- (total intensity)
Supervised free-space annotations (1/0 mask) are derived semi-automatically using Segment Anything (SAM) with manual refinement. Depth ground truth corresponds to projected lidar points, with masks provided for occluded pixels.
Calibration
- Intrinsics: OpenCV chessboard
- Extrinsics: Planar least-squares using chessboard planes
- Camera-to-lidar:
- GNSS/INS: WGS-84 and local-ENU, synchronized by timestamp
Benchmark Splits and Statistics
| Task | Train | Validation | Test | Note |
|---|---|---|---|---|
| Free-space detection | 6,206 | 856 | 969 | Geographically disjoint |
| Monocular depth | 6,116 | 778 | 778 | Vehicles >15 km/h |
- DoLP histogram: Peaks 0.2 on matte, up to 0.8 on glass
- Scene coverage: Six suburban, fair-weather, midday; street depths 5–50 m
Baseline Results
Free-space detection (KITTI-road metrics):
- RGBP-RoadSeg (RGB + polarization): IoU = 0.939, AP = 0.994
- Adding polarization yields +1.5% IoU gain versus RGB
Monocular depth (Eigen metrics):
- RGB-Depth: RMSE = 6.389, = 0.904
- RGBP-Depth: RMSE = 6.172, = 0.911
Minimal DNN architectural modifications are needed: input expansion to (where 0) suffices to yield state-of-the-art results (Baltaxe et al., 2023).
2. Exoplanetary Imaging: POLARIS-53K High-Contrast Polarimetric Benchmark
POLARIS-53K in astrophysics is a high-quality, uniformly reduced polarimetric imaging dataset used for direct imaging of exoplanetary disks with public VLT/SPHERE-IRDIS dual-beam polarimetric data collected from 2014–2024 (Cao et al., 4 Jun 2025). Its composition:
| Data Type | # of Samples | Resolution / Format |
|---|---|---|
| Preprocessed exposures | 75,910 | 4n × 1024×1024 FITS cubes (Q+, Q−, U+, U−) |
| PDI postprocessed 1 | 921 | 1024×1024 FITS |
| Cropped polarimetric images | 53,000 | 256×256 NumPy, FITS, .jpeg |
The “53K” refers to central 2 crops from 75,910 exposures, excluding the coronagraph-masked star core. Uniform IRDAP preprocessing (bad-pixel removal, flat-fielding, background subtraction, astrometric alignment, dual-beam comb.) standardizes all inputs.
Polarimetric Representations
- 3, 4
- 5, 6
- All frames log-transformed, rescaled: exposures 7, polarization images 8
Labeling and Splits
801 out of the 921 9 frames are pseudo-labeled via spectral clustering (substantially reducing manual annotation requirement). Classes: reference star (no disk) vs. target disk system (strong scattered-light disk). Class imbalance (96 manually labeled disks, 825 references) is mitigated by stratified sampling.
Recommended ML splits (for 0):
- Train: 67/67 per class, Val: 15/15, Test: 14/14
Benchmarking Tasks
- Representation learning: Masked Autoencoder, DeepCluster, SimCLR, Diff-SimCLR (proposed)
- Diff-SimCLR achieves 93.0% mean supervised accuracy (SVC, 32D embedding), 77.3% unsupervised clustering
- Vision-language zero-shot (GPT-4.1, Gemini-2.0-Flash): max 75% accuracy
Applications
- Automated RDI reference selection for starlight subtraction
- Disk morphology/anomaly classification
- Generative background imputation for PSF subtraction
- Transfer to other telescopes (e.g., GPI/CHARIS, Roman)
3. Maritime Perception: PoLaRIS-53K Object Detection and Tracking Dataset
PoLaRIS-53K addresses multimodal maritime perception: 360,000 frames from 5 diverse video sequences on the Pohang Canal, annotated for detection and multi-object tracking of ships and buoys (Choi et al., 2024).
| Modality | Resolution | Frequency |
|---|---|---|
| Stereo RGB | 2048×1080 | 10 Hz |
| Thermal IR | 2048×1080 | 10 Hz |
| 3D Lidar | - | 10 Hz |
| 2D Radar | BEV (x,y) | ∼1 Hz |
Annotations and Calibration
- 190,000 2D boxes (1120,000 ship, 270,000 buoy)
- LiDAR point-level per-object labels: 380,000 per sequence
- Radar cluster labels via DBSCAN
- Track-IDs for all dynamic objects
- Minimum box: 4 px (0.0045% frame), manual refinement for <5% objects
- Known extrinsic/intrinsic matrices for cross-modal projections.
Detection and Tracking Protocols
Metrics:
- IoU: 5
- mAP, [email protected]:.95, MOTA, IDF1, IDP, IDR
Baseline closed-set models (COCO-pretrained):
| Model | Day-RGB mAP | Night-RGB mAP | Day-TIR mAP | Night-TIR mAP |
|---|---|---|---|---|
| YOLOv8-L | 79.6% | 84.2% | 58.4% | 64.0% |
| YOLOv10-L | 80.1% | 82.9% | 54.9% | 59.1% |
| RT-DETR-R50 | 78.4% | 69.1% | 53.4% | 55.2% |
- RGB tracking: OC-SORT achieves MOTA 98.5% (day), ByteTrack 99.6% (night)
- TIR tracking lags RGB; best: MOTA 75.5% (day), 88.0% (night)
Known Limitations
- Only two object classes; no heavy-weather or high-severity occlusion
- TIR images are low-res/blurry; radar clusters miss distant targets
- Geographic coverage restricted to Pohang Canal
Recommendations
- For TIR, train from scratch; low-light enhancements are beneficial
- Fuse depth and radar for robust detection/tracking
- Minimum 10×10 px objects stress small-object detection capacity
4. Comparison and Nomenclature Across Domains
The “POLARIS-53K” designation is non-unique, referencing three unrelated datasets:
| Domain | Dataset Purpose/Type | Primary Ref |
|---|---|---|
| Automotive | RGB-polarimetric driving, spatiotemporal alignment | (Baltaxe et al., 2023) |
| Astronomy | Exoplanet disk polarimetric imaging, representation | (Cao et al., 4 Jun 2025) |
| Maritime | Stereo-RGB/TIR/Lidar/Radar for object detection/tracking | (Choi et al., 2024) |
Each dataset is tailored for advanced machine learning tasks in its field, but inter-dataset interoperability is neither claimed nor supported.
5. Impact on Perception Algorithms and Research
The introduction of these large, well-annotated polarimetric and multimodal datasets has enabled:
- State-of-the-art perception benchmarks leveraging polarization, surpassing prior RGB-only performance (automotive).
- New paradigms for unsupervised and supervised representation learning in high-dynamic-range, high-contrast scientific imaging, advancing both astrophysics and machine learning (astronomy).
- Extensive evaluation and training of object detection and tracking models in challenging small-object, low-light, multi-sensor, real-world maritime environments (maritime).
For all domains, the release of POLARIS-53K datasets has been pivotal in quantifying the benefits of polarimetric and multimodal sensing and providing robust baselines for the next generation of perception algorithms.
6. Access, Usage, and Best Practices
- Automotive (Baltaxe et al., 2023): Dataset is described for research on perception tasks with pixel-synchronized polarimetric, lidar, and inertial data.
- Astronomy (Cao et al., 4 Jun 2025): Downloadable via Zenodo; all preprocessing and code for loading and ML baselines is published (Python 3.11+, PyTorch).
- Maritime (Choi et al., 2024): Dataset available at https://sites.google.com/view/polaris-dataset; recommended to use joint 2D/3D supervision and to exploit calibration tools for multi-modal projections.
Methodological best practices are detailed in each reference, with a focus on task-appropriate normalization, calibration, and augmentation pipelines. Stratified sampling is vital where labeling is highly imbalanced. For detection/tracking tasks, cross-sensor calibration and careful manual refinement remain critical.
Principal References:
- "Polarimetric Imaging for Perception" (Baltaxe et al., 2023)
- "POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation Learning" (Cao et al., 4 Jun 2025)
- "PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal" (Choi et al., 2024)