Aerial Datasets: New Zealand & Utah

Updated 31 July 2025

Aerial datasets are high-resolution, patch-sampled collections from LINZ and UGRC, designed for robust small vehicle detection research.
They feature precise annotation protocols with over 1.45 million and 2.14 million samples from New Zealand and Utah, addressing challenges like occlusion and overlapping vehicles.
Integrating fine-tuned latent diffusion models for generative augmentation significantly boosts cross-domain transferability, with improvements up to 50% relative gain.

Aerial datasets from New Zealand and Utah, specifically the LINZ (Land Information New Zealand) and UGRC (Utah Geospatial Resource Center) datasets, represent substantial contributions to the field of remote sensing object detection. These datasets provide detailed, high-resolution annotated imagery focused on small vehicle detection, primarily “car” class objects, and are designed explicitly to facilitate domain adaptation research by presenting significant environmental and context variability. Their introduction is directly tied to methodologies employing generative models, particularly fine-tuned latent diffusion models (LDMs), to address generalization issues that arise when training detectors on data from one geographic region and testing on another. This approach supports advancements in domains including traffic monitoring, urban analysis, and surveillance, and establishes new benchmarks for evaluating cross-domain transferability in aerial image understanding (Fang et al., 28 Jul 2025).

1. Data Collection and Curation

The LINZ and UGRC datasets are constructed from geospatial sources within their respective regions. The LINZ dataset comprises aerial imagery collected in Selwyn, New Zealand, through the Land Information New Zealand platform; the UGRC dataset is assembled from the Utah Geospatial Resource Center’s imagery archive. Both collections are captured at a high spatial resolution of 12.5 cm per pixel.

To enhance vehicle detection and alleviate generative model limitations, the datasets are subsampled into 112×112 pixel patches. This approach increases the relative object size (vehicle pixels per patch), making small vehicles—often underrepresented or difficult in wide-area overhead imagery—detectable both by neural detectors and by the diffusion model during data synthesis. The sampling strategy results in overlapping image regions, which can introduce complexity when vehicles appear in close proximity, thereby creating particular instances of ambiguity or object occlusion.

Dataset	Region	Training Samples	Resolution	Patch Size
LINZ	New Zealand	>1.45 million	12.5 cm/pixel	112×112 px
UGRC	Utah, USA	>2.14 million	12.5 cm/pixel	112×112 px

2. Annotation Protocols and Dataset Characteristics

Precise annotations are provided for small vehicles, chiefly under the “car” category. Training, validation, and test splits are made available, with approximately 190,000 LINZ validation images and millions of total samples in UGRC. The UGRC imagery is marked by unique terrain contexts, including sandy or rocky desert environments, increasing the representation of off-road vehicles and complicating the detection task.

The annotation process is designed to support both supervised learning and validation of synthetic data-driven approaches. Patch-based annotation addresses the challenge posed by low-resolution cross-attention maps commonly found in diffusion-based generative models, which typically operate at a downsampled spatial grid (e.g., 8×8). This strategy is intended to maximize the utility of generative augmentation while providing a reliable foundation for evaluating small object detection methodologies.

A distinctive aspect of these datasets is their pronounced domain gap: cross-dataset evaluation reveals a greater than 25 percentage point reduction in AP₅₀ when training on one domain (e.g., LINZ) and testing on the other (e.g., UGRC). This highlights the datasets’ role as testbeds for research in domain adaptation and transfer learning across geographically and environmentally divergent contexts.

3. Generative Data Augmentation and Label Synthesis

Integrating these aerial datasets into generative AI approaches involves several tightly-coupled stages. A pre-trained Stable Diffusion model is fine-tuned separately for the source (e.g., LINZ) and target (e.g., UGRC) domains using domain-specific prompt conditioning (notated as cₛ and cₜ). Images are encoded into latent space, noise is injected per a forward diffusion process:

$z_t = \sqrt{\bar{\alpha}_t} \cdot z_0 + \sqrt{1 - \bar{\alpha}_t} \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I),\quad \bar{\alpha}_t = \prod_{s=1}^t \alpha_s$

and the reverse process is learned via a U-Net network conditioned on the text embedding. The objective minimized is

$L_{LDM} = \mathbb{E}_{E(x),c,\epsilon,t} \left[\| \epsilon - \epsilon_\theta(z_t, t, c) \|_2^2 \right]$

where $c \in \{c_s, c_t\}$ .

Multi-modal knowledge transfer is achieved via learnable cross-attention tokens—[V₁] for the object (car) and [V₂], [V₃] for domain backgrounds—embedded in the prompt to ensure separation of object and background regions. The final loss incorporates regularizers:

$L = L_{LDM} + L_{obj} + L_{bg}$

with $L_{obj}$ and $L_{bg}$ computed from normalized cross-attention maps for object and background tokens.

Automated object label generation occurs via extraction and stacking of cross-attention maps, followed by a two-step detector pipeline:

Pseudo-labeling synthetic source images with a detector trained on annotated data (Faster R-CNN, YOLOv5/YOLOv8, or ViTDet).
Retraining a detector on the enhanced cross-attention synthetic data, then applying it to synthetic target domain data for final pseudo-label extraction, which is subsequently filtered by confidence thresholds to construct an annotated synthetic target set.

4. Benchmarking, Evaluation, and Domain Transfer

The LINZ and UGRC datasets support quantitative benchmarking of domain adaptation in aerial object detection. Significant performance degradation is observed when detectors are directly transferred between domains (LINZ to UGRC or vice versa), illustrated by the observed 25+ percentage point loss in AP₅₀. Utilizing the generative augmentation strategy increases AP₅₀ by 4–23% over supervised learning on source data, 6–10% over weakly supervised adaptation, 7–40% over unsupervised adaptation, and over 50% relative improvement compared to open-set detection approaches.

Consistent gains are demonstrated across distinct detection architectures, with synthetic data produced by the generative pipeline being instrumental in bridging the gap between source and target domain distributions. This addresses the domain shift caused by differences in environmental conditions, vehicle appearance, urban forms, and image acquisition parameters.

5. Distinctive Features and Research Implications

Unique to these datasets is their scale, high-precision vehicle annotations, high spatial resolution, and explicit sampling design to support generative modeling. The patch-based approach is specifically intended to address the limitations of diffusion models in small object representation, though it introduces challenges in scenes with overlapping vehicles.

The datasets’ geographic diversity—New Zealand’s urban/rural landscapes versus Utah’s distinctive desert and rocky terrain—enables investigation into the generalization abilities of detectors under substantial domain variation. This complements and extends existing datasets used in aerial detection (such as DOTA) by providing severe domain shifts for rigorous adaptation testing.

Potential applications include urban planning, vehicle tracking for traffic flows, remote surveillance, and autonomous aerial systems, where accurate detection of small, densely clustered vehicles is operationally critical. Their utility is amplified by integration into frameworks that leverage modern generative and discriminative methodologies.

6. Future Directions and Extensions

The two datasets present opportunities for future research in several areas:

Continued advancement and evaluation of generative augmentation for domain adaptation.
Investigation of patch-based versus holistic image annotation and its impact on detection under occlusion and dense object distribution.
Exploration of cross-modal adaptation (e.g., applying models trained on LINZ/UGRC imagery to multispectral or SAR data).
Comparative evaluation with new annotation schemas or enhanced object class granularity.

Given the release of both datasets and code resources via the referenced project website (https://humansensinglab.github.io/AGenDA), ongoing community-driven benchmarking and method development are expected to further clarify open questions in small object detection and cross-domain adaptation in aerial sensing.

7. Conclusion

The LINZ and UGRC aerial datasets from New Zealand and Utah establish a new standard for annotated, high-resolution, small object detection data in remote sensing. Their challenging domain variability, precise per-patch annotation, and integration with state-of-the-art generative augmentation pipelines position them as essential resources for research in domain adaptation, synthetic data generation, and real-world vehicle detection in aerial imagery (Fang et al., 28 Jul 2025). Their deployment provides an empirical foundation for future algorithmic developments addressing robust, transferable computer vision solutions in variable aerial environments.

PDF Markdown Chat (Pro)

References (1)

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision (2025)

Follow Topic

Get notified by email when new papers are published related to Aerial Datasets from New Zealand and Utah.