AIMIP Phase 1 Dataset Overview

Updated 4 July 2026

In climate modeling, the AIMIP Phase 1 Dataset combines ERA5-based SST and SIC forcing with AI and conventional model outputs to assess historical and perturbed scenarios.
The dataset is a family of first-phase releases across domains, delivering standardized annotations and protocols in medical imaging, flight simulation, solar imaging, and autonomous driving.
It offers diverse data modalities and evaluation metrics, enabling cross-domain research in simulation fidelity, segmentation accuracy, and multimodal perception under challenging conditions.

Searching arXiv for the cited AIMIP-related papers to ground the article in current records. “AIMIP Phase 1 Dataset” is not a universally unique designation. In the arXiv literature, the phrase refers most explicitly to Phase 1 of the AI weather and climate model intercomparison project, a standardized AMIP-style benchmark for AI weather and climate models (Henn et al., 7 May 2026). In parallel, closely related or context-dependent uses of the same label appear in medical imaging, flight-simulator training, solar image-parameter curation, and autonomous driving, where “AIMIP Phase 1” functions as an organizational or retrospective name for an initial public release rather than as the formal title of the underlying paper (Murugesan et al., 2024, Samuel et al., 2022, Ahmadzadeh et al., 2019, Matuszka et al., 2022). The term therefore requires domain-specific disambiguation before any technical discussion of data format, labels, metrics, or intended use.

1. Terminological scope and disambiguation

The cited literature uses “AIMIP Phase 1 Dataset” in multiple, non-equivalent ways. In one case it is the formal subject of a climate-model intercomparison paper; in the others it denotes a first-stage release associated with a project, platform, or internal program.

Domain	Dataset designation	Distinguishing features
Weather and climate	AIMIP Phase 1	AMIP-style atmosphere-only simulations, ERA5-trained AIWCMs
Medical imaging	AIMIP Phase 1 dataset	nnU-Net DICOM-SEG annotations for 11 IDC collections
Flight training	Maneuver ID / AIMIP Phase 1	VR T-6 simulator sorties with good/bad labels
Solar imaging	Curated AIA image parameter dataset	10 image parameters, 9 channels, 6-minute cadence
Autonomous driving	aiMotive / AIMIP Phase 1	Multimodal long-range perception dataset

In the medical-imaging paper, “AIMIP” refers to the AI in Medical Imaging Project funded by NCI and the paper functionally represents Phase 1 of large-scale standardized AI annotations for IDC collections (Murugesan et al., 2024). In the flight-training paper, the phrase “AIMIP Phase 1” does not appear in the paper itself, but the public Maneuver ID Challenge dataset is described as the same resource that many Air Force or AIMIP documents call “AIMIP Phase 1” (Samuel et al., 2022). In the solar-imaging paper, “AIMIP” and “Phase 1” are likewise not explicit, but the curated parameter archive is presented as the first large-scale tuned release later referred to under that umbrella (Ahmadzadeh et al., 2019). In the aiMotive paper, the public release is associated with aiMotive’s “AIMIP” program and is described as the initial public release with a fixed sensor suite and annotation pipeline (Matuszka et al., 2022).

This suggests that the expression is best treated as a family of first-phase dataset releases rather than a single cross-domain dataset name.

2. AIMIP Phase 1 in weather and climate modeling

In climate science, AIMIP Phase 1 is a well-defined dataset and experiment rather than merely a collection of files. It combines a monthly SST and SIC forcing dataset based on ERA5 for 1979–2024, ERA5 reanalysis as the reference dataset, a common AMIP-style atmosphere-only experiment design, standardized CMIP-style outputs, and a fixed evaluation suite (Henn et al., 7 May 2026). The participating models must simulate the atmosphere given specified historical sea surface temperatures over 1979–2024, with AI components trained only on ERA5 over 1979–2014. The simulation period runs from Oct. 1, 1978 to Dec. 31, 2024, with the first three months treated as spin-up and the main analysis beginning Jan. 1, 1979. The prescribed evaluation split is 1979–2014 for in-sample analysis and 2015–2024 for out-of-sample analysis (Henn et al., 7 May 2026).

The dataset comprises outputs from eight AI weather and climate models and one conventional CMIP6 model, GFDL-CM4/AM4, with five-member ensembles per AI model. Required outputs include monthly means from Oct. 1, 1978 to Dec. 31, 2024 and daily means for Oct. 1, 1978 to Dec. 31, 1979 and Jan. 1, 2024 to Dec. 31, 2024. Models must provide at least the pressure levels 1000, 850, 700, 500, 250, 100, and 50 hPa, and the standard variables include temperature, specific humidity, eastward and northward wind, 500 hPa geopotential height, surface pressure or sea-level pressure, skin temperature, 2-meter air temperature, near-surface humidity, 10-meter winds, and surface precipitation rate where available (Henn et al., 7 May 2026).

Its scientific purpose is systematic intercomparison. AIMIP Phase 1 defines five major evaluation dimensions: biases, trends, response to El Niño-related SST anomalies, temporal variability, and out-of-sample generalization tests. The out-of-sample regime includes aimip-p2k and aimip-p4k perturbation runs in which SST is uniformly increased by +2 K or +4 K everywhere, while SIC may remain unchanged depending on model details (Henn et al., 7 May 2026). The published findings indicate that AI models can simulate historical climate and SST-forced variability comparably to a conventional physically based model in several respects, but some models underestimate historical warming trends and diverge markedly in the perturbed warming experiments. For dataset users, this makes AIMIP Phase 1 both a benchmark for climatological fidelity and a stress test for extrapolative generalization.

3. AIMIP Phase 1 in cancer imaging and the NCI Imaging Data Commons

In medical imaging, AIMIP Phase 1 denotes the first large multi-organ, multi-cancer release from the AI in Medical Imaging Project into the NCI Imaging Data Commons. The release enriches 11 IDC collections with AI-generated nnU-Net segmentations and distributes both AI and radiologist-corrected labels through IDC and Zenodo (Murugesan et al., 2024). The six segmentation tasks are Brain-MR, Breast-MR, Kidney-CT, Lung-CT, Liver-CT, and Prostate-MR. The associated IDC collections include UPENN-GBM, Duke-Breast-Cancer-MRI, TCGA-KICH, TCGA-KIRP, CPTAC-CCRCC, QIN Lung CT, NSCLC-Radiomics, SPIE-AAPM Lung CT Challenge, an NLST subset, HCC_TACE_Seg, Colorectal-Liver-Metastases, and Prostate MRI–US Biopsy, with the LiTS’17 cohort also referenced in the liver context (Murugesan et al., 2024).

The technical core is a set of task-specific nnU-Net models trained on open-source datasets. Brain-MR uses BraTS 2021 with T1, T1c, T2, and FLAIR inputs and segments whole tumor and subregions. Breast-MR uses one model for breast and fibroglandular tissue and another for tumor or structural tumor volume, combining them into a single three-label output. Lung-CT is trained on LIDC-IDRI and NSCLC-Radiomics, with lung masks derived by TotalSegmentator and post-processing that removes nodules outside the 3–30 mm size range. Liver-CT uses LiTS 2017, the Medical Segmentation Decathlon liver dataset, and selected TotalSegmentator outputs to produce liver, liver-tumor, and additional abdominal-organ labels. Kidney-CT and Prostate-MR reuse pretrained BAMF AIMI nnU-Net models from prior work (Murugesan et al., 2024).

A defining feature of this Phase 1 release is standards-based integration. All annotations, both AI and radiologist-corrected, are delivered as DICOM Segmentation objects. AI segmentations are marked with SegmentAlgorithmType = AUTOMATIC, whereas radiologist-corrected segmentations use SEMIAUTOMATIC. Each DICOM-SEG references the source imaging series via ReferencedSeriesSequence, and metadata such as SegmentNumber, SegmentDescription, and SegmentAlgorithmType make the label sets self-describing (Murugesan et al., 2024).

Quality assurance covers about 10% of enriched images per task. Radiologists reviewed and edited AI outputs in 3D Slicer, producing corrected NRRD segmentations that were then converted to DICOM-SEG. The paper reports DSC, NSD, and 95% Hausdorff Distance between AI predictions and corrected segmentations. The authors explicitly note correction bias, because radiologists edited AI masks rather than contouring de novo; this can inflate overlap metrics and deflate boundary-error metrics, especially for large, high-contrast structures such as kidneys (Murugesan et al., 2024). The release is therefore best understood as a derived annotation layer over IDC collections, optimized for research on segmentation, radiomics, human-in-the-loop correction, and cross-domain generalization.

4. AIMIP Phase 1 as the Maneuver ID flight-simulator dataset

In the Air Force pilot-training context, “AIMIP Phase 1 dataset” refers in practice to the public Maneuver Identification Challenge dataset, even though the paper itself uses the names “Maneuver ID dataset” and “Maneuver Identification (ID) Challenge dataset” and does not use the phrase “AIMIP Phase 1” (Samuel et al., 2022). The source data were collected at Pilot Training Next in VR T-6 simulators using Lockheed Martin Prepar3D with a custom logging system. The release contains thousands of distinct pilot training sessions from hundreds of hours on flight simulators, with each session represented as a sortie that may contain multiple maneuvers as well as artifacts such as taxiing, idling, or teleportation (Samuel et al., 2022).

The public data consist of two primary modalities per sortie: multivariate trajectory time series in TSV format and top-down 2D ground-track images in PNG format. The time-series fields include time (sec), xEast (m), yNorth (m), zUp (m), vx (m/s), vy (m/s), vz (m/s), head (deg), pitch (deg), and roll (deg). Positions are given in meters, velocities in meters per second, and angles in degrees, with absolute location removed by moving each recording to a standard starting location and altitude (Samuel et al., 2022). The dataset also provides exemplar maneuver files, one per maneuver type, and a public list of 18 maneuver types with textual descriptions and videos.

Its main public labels are sortie-level “good” versus “bad” classes. Good sorties are described as unbroken trajectories with realistic maneuvers, whereas bad sorties include straight lines, jumps, physically impossible maneuvers, long taxi segments, and unrealistic speeds. These labels were created by two non-subject-matter experts and spot-checked by pilots, so some label noise is explicitly acknowledged (Samuel et al., 2022). Additional irregularity tags can mark teleportation, taxiing or stopped-on-ground periods, and irregular stopping, but the main challenge labels remain sortie-level rather than maneuver-level.

The challenge defines three tasks: sorting physically feasible versus infeasible sorties, identifying maneuvers within sorties, and scoring maneuver quality. Only the first of these is substantially labeled in the public release. There are no official maneuver-type labels for the main sortie corpus and no maneuver-quality scores, so Tasks 2 and 3 remain under-labeled and are approached through exemplar matching, unsupervised methods, or prospective crowd-sourced annotation (Samuel et al., 2022). A plausible implication is that this “Phase 1” dataset is primarily an infrastructure dataset for data cleaning, weak supervision, and methodological bootstrapping rather than a fully supervised maneuver-recognition benchmark.

5. AIMIP Phase 1 as a curated solar image-parameter archive

In solar physics and heliophysics, the phrase is used retrospectively for the curated image parameter dataset extracted from the Solar Dynamics Observatory mission’s Atmospheric Imaging Assembly. The paper describes a public archive covering January 2011 through the current date at the time of publication, with a cadence of six minutes for nine AIA wavelength channels and a volume per year just short of 1 TiB (Ahmadzadeh et al., 2019). The dataset is built not from raw images alone but from a set of ten tuned image parameters computed independently for each channel.

The ten parameters comprise first intensity moment or mean intensity, second central moment or intensity variance, entropy of the intensity histogram, uniformity of the intensity histogram, skewness, kurtosis, fractal dimension, an edge density or edge strength measure, Tamura coarseness, and a Tamura contrast or directionality-type texture parameter (Ahmadzadeh et al., 2019). The channels are the standard nine AIA science channels: 94 Å, 131 Å, 171 Å, 193 Å, 211 Å, 304 Å, 335 Å, 1600 Å, and 1700 Å. For a single six-minute timestamp, this yields up to $9 \times 10$ scalar features.

The central contribution is methodological tuning. The authors evaluate assumptions affecting histogram binning, intensity scaling, edge-detection thresholds, fractal-dimension scales, Tamura neighborhood sizes, and solar-disk masking, and then choose settings through supervised validation on region-classification tasks. They explicitly compare JP2 and FITS source images and report that JP2-derived parameters perform essentially as well as FITS-derived parameters for region classification when tuned appropriately (Ahmadzadeh et al., 2019). The release is publicly accessible through the API at http://dmlab.cs.gsu.edu/dmlabapi.

As used under the “AIMIP Phase 1” label, this dataset is therefore a compact, production-scale representation of the AIA image stream suitable for region classification, content-based image retrieval, event tracking, and forecasting-oriented time-series analysis (Ahmadzadeh et al., 2019). Its scientific value lies in the conversion of full-disk imaging into a low-dimensional but systematically tuned feature space.

6. AIMIP Phase 1 as the aiMotive multimodal driving release

In autonomous driving, the term refers to the initial public aiMotive multimodal dataset release associated with aiMotive’s AIMIP program. The dataset contains 176 scenes, each 15 seconds long, with 26,583 annotated frames and synchronized, calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view (Matuszka et al., 2022). It was recorded in highway, urban, and suburban areas across California, Austria, and Hungary, using four cars, under daytime, night, rain, cloudy, and glare conditions.

The sensor suite consists of one roof-mounted 64-beam rotating LiDAR, four cameras, two long-range radars, and high-precision GNSS-INS localization. LiDAR point clouds are stored as compressed LAS files, cameras as JPG, radar targets as JSON, and labels as JSON arrays of 3D cuboids and associated 2D boxes (Matuszka et al., 2022). The dataset defines five coordinate systems: global ECEF, body, radar, camera, and image coordinates. All 3D bounding boxes are defined in the body frame, and each annotation includes center position, extent, quaternion orientation, relative velocity, class label, and a unique track ID that is consistent across frames within a scene.

A principal design goal is long-range perception. The dataset contains more than 425,000 3D objects, with about 24% of cuboids lying beyond 75 m from the ego vehicle and annotations extending to about 200 m. The paper contrasts this with lower far-range fractions in Argoverse2, Waymo, nuScenes, and ONCE, and notes a comparatively low percentage of empty boxes beyond 50 m and 75 m (Matuszka et al., 2022). Baseline models are provided for LiDAR-only, camera-only, and multimodal 3D detection, including simple LiDAR-plus-radar fusion by converting radar targets to pseudo point clouds.

The dataset is positioned as a benchmark for robust multimodal perception under adverse weather, especially because radar improves long-range performance in several regimes while camera inputs help orientation estimation. At the same time, the authors report that naive multimodal fusion can underperform LiDAR-only models in heavy rain, indicating that the dataset is deliberately challenging rather than saturating current fusion methods (Matuszka et al., 2022). In the “AIMIP Phase 1” sense, it is thus an initial large-scale release aimed at long-range, adverse-weather, multi-sensor research rather than a generic driving corpus.