Ancient Plant Seed Image Dataset
- The APS dataset is a large, publicly available benchmark featuring 8,340 high-resolution seed images from diverse archaeological sites in China.
- It includes 17 botanical taxa with a long-tailed distribution and employs standardized image acquisition and preprocessing protocols for research consistency.
- APSNet, the baseline model, integrates high-frequency size cues via SPE modules and achieves superior accuracy and F1 scores over conventional deep learning approaches.
The Ancient Plant Seed Image Classification (APS) Dataset constitutes the first large-scale, publicly-reportable image benchmark dedicated to the computational identification of archaeological plant seeds. It provides 8,340 high-resolution, consistently acquired seed images covering 17 genus- or species-level categories excavated from 18 sites across China, with temporal coverage from 5,400 BCE to 220 CE. The APS dataset underpins the development and evaluation of APSNet, a deep learning framework purpose-built for fine-grained, long-tailed archaeological seed classification and the explicit incorporation of size cues to enhance discriminative modeling (Xing et al., 20 Dec 2025).
1. Dataset Composition and Taxonomic Scope
The APS dataset comprises 8,340 seed images, each individually centered and downsampled to 512×512 pixels. It covers 17 botanical taxa, with per-class instance counts reflecting archaeobotanical realities and resulting in a strongly long-tailed class distribution. The dataset supports genus- and species-level differentiation, encapsulating both major domesticated cereals (e.g., Oryza sativa, Triticum aestivum) and minor or wild taxa (e.g., Acalypha australis, Lespedeza bicolor). The table below details the taxonomic label list and per-class image count.
| Taxon | Image Count (Total) | Notes |
|---|---|---|
| Setaria italica | 1,724 | largest class |
| Triticum aestivum | 1,109 | common wheat |
| Panicum miliaceum | 925 | broomcorn millet |
| Cannabis sativa (soaked) | 708 | |
| Glycine max | 719 | soybean |
| Digitaria | 762 | crabgrass genus |
| Oryza sativa | 736 | rice |
| Hordeum vulgare | 473 | barley |
| Setaria | 296 | foxtail millet (wild forms) |
| Cannabis sativa (charred) | 282 | charred |
| Sorghum bicolor | 274 | sorghum |
| Lespedeza bicolor | 85 | minor class |
| Melilotus suaveolens | 54 | minor class |
| Portulaca oleracea | 57 | |
| Bassia scoparia | 46 | minor class |
| Prunus persica | 45 | peach, minor class |
| Acalypha australis | 45 | lowest representation |
Specimens are derived from 18 archaeological sites across both northern and southern China. The temporal span encompasses Neolithic and Bronze Age cultural horizons, with provenance details and site geocoordinates provided in the project’s supplementary materials (Xing et al., 20 Dec 2025).
2. Image Acquisition Protocol and Preprocessing
All seed images were acquired using an OLYMPUS SC180 digital microscope camera (1/2.3″ CMOS sensor, 4912×3684 px, pixel size 1.25 μm), under fixed 1.6× magnification to preserve real-world size scale. Illumination was achieved using a ring-light integrated into the microscope, and color consistency was maintained via Olympus’s built-in color management and a constant white-balance setting.
Preprocessing steps included:
- Center cropping of individual seeds from raw captures.
- Downsampling to 512×512 pixels using bilinear interpolation.
- Manual filtering to exclude severely damaged seeds or non-seed artifacts.
No geometric or photometric augmentation was applied during APSNet baseline experiments, although practitioners may optionally employ random flips, rotations, or color jitter to improve downstream robustness (Xing et al., 20 Dec 2025).
3. Train/Test Splits, Directory Structure, and Access
The dataset is partitioned into training (∼7,000 images) and test (∼3,340 images) subsets using an approximate 70:30 ratio, with per-class splits maintaining the natural long-tail characteristics observed in archaeological recovery. Images are organized in an intuitive folder hierarchy segregated by class and split, and provided in 8-bit PNG format; filenames encode site and sample identifiers.
| Split | Image Count | Stratification |
|---|---|---|
| Train | ∼7,000 | 70% of each class |
| Test | ∼3,340 | 30% of each class |
The dataset is available upon reasonable request from the corresponding author, with academic use permitted and separate agreements required for commercial applications. Extension best practices include matching acquisition magnification and lighting, collecting ≥200 samples per new class, and optional use of class-rebalancing or augmentation strategies for dealing with class imbalance (Xing et al., 20 Dec 2025).
4. APSNet Baseline Model: Architecture and Training
APSNet is the reference architecture for APS, structurally aligning with an encode–decode paradigm tailored for fine-grained, small-object recognition in antiquities. The encoder employs a ResNet50 backbone, at stages 2–5 of which Size Perception & Embedding (SPE) modules are interleaved. The SPE module extracts high-frequency (HF) size priors using 2D Fast Fourier Transform (FFT) filters that suppress low-frequency components ( masks central spectral bins). HF features are fused into the main CNN activation stream via channel-attention mechanisms and convolutions.
The decoder utilizes an Asynchronous Decoupled Decoding (ADD) mechanism, comprising four serially trained heads:
- Head_1: Channel-branch, outputs via fully connected (FC) and softmax, standard cross-entropy.
- Head_2: Deeper channel-focused variant, also with FC-softmax.
- Head_Dc: Dual-branch structure combining a channel path (max-pooled, FC-softmax) and a spatial path (1,000-D projection, per-sample channel selection, L2 normalization), supervised by a composite of cross-entropy and a supervised contrastive loss targeting discriminative feature separation.
- Head_Con: Concatenates the above branches, final FC-softmax, with resultant cross-entropy loss.
The final model output is the weighted sum of stage-wise cross-entropy terms.
5. Empirical Evaluation and Comparative Performance
APSNet benchmarking follows a 200-epoch schedule (early stopping after 30 epochs without improvement), with SGD (momentum 0.9, weight decay ), batch size 16, and an initial learning rate of stepped down at epochs 80 and 140. Evaluation metrics include overall accuracy, precision, recall, and F1.
Key outcomes:
- APSNet achieves 90.2% accuracy and 77.5% F1, surpassing all 28 considered baselines across three comparator groups: long-tail learning methods (top accuracy 46.0%), fine-grained visual classification (FET-FGVC, 76.9%), and standard CNNs (GhostNetv2, 84.9%).
- Improvement margins: APSNet exceeds GhostNetv2 by +5.3 percentage points (accuracy) and +12.1 pp (F1).
- APSNet attains ≥90% accuracy in well-sampled categories and exhibits significant performance gains (e.g., 69% precision for Lespedeza bicolor vs 50% with GhostNetv2) in underrepresented taxa.
Ablation studies establish the impact of high-frequency priors in SPE (1.4 bit entropy guides learning most effectively), and show generalization of SPE and Head_Dc transferable to modern CNNs and Vision Transformers, with accuracy gains of 2–7 percentage points (Xing et al., 20 Dec 2025).
6. Qualitative Analysis and Failure Cases
UMAP projections of feature embeddings reveal that APSNet forms denser intra-class and more distinct inter-class clusters than baseline networks. Grad-CAM analysis indicates that APSNet more accurately localizes diagnostic features, such as seed contour and surface texture on medium/large seeds, while baselines may over-focus on background or fail to capture object boundaries. Misclassification remains a challenge for small seeds subjected to extensive charring, as with Portulaca oleracea, due to pronounced intra-class heterogeneity (Xing et al., 20 Dec 2025).
7. Guidelines for Usage, Extension, and Experimental Reproducibility
Best practices for APS dataset extension and APSNet adaptation include:
- Initiate training with weights pretrained on APS and freeze specialist modules (SPE, ADD) initially when fine-tuning.
- Employ class rebalancing (e.g., per-class oversampling, focal-loss) for severe imbalance.
- Augment data via controlled rotations (±15°), horizontal/vertical flips, and light color-jitter to enhance robustness.
- For compatibility, replicate acquisition magnification, lighting, and collect sufficient sample size (≥200/class) when introducing new taxa.
This protocol supports reproducibility and comparability across future archaeobotanical and computational taxonomy studies leveraging the APS resource (Xing et al., 20 Dec 2025).