MPDD-Young Dataset Overview
- MPDD-Young is a term that encompasses three distinct datasets addressing tropical forest phenology, multimodal depression detection, and photometric stellar cataloging.
- In tropical forest phenology, it documents a 23-year surveillance of young leaf production with standardized protocols and climate covariates to enhance community-level modeling.
- For depression detection and stellar analysis, it provides multimodal behavioral data and rigorously calibrated stellar parameters, enabling improved predictive models and exoplanet survey refinement.
The term “MPDD-Young Dataset” has been used for three scientifically distinct resources in different domains: (1) a 23-year phenological surveillance of young leaf production in a Ugandan rainforest; (2) a dataset of multimodal behavioral and personality data for depression detection in young adults; and (3) a uniform catalog of photometrically inferred parameters for young stars in nearby stellar associations. Each dataset serves a specific research community and deploys unique methodologies and ontologies.
1. MPDD-Young in Tropical Forest Phenology
The MPDD-Young Dataset as documented by Lüthy et al. (2024) (Lüthy et al., 13 Jan 2025) provides a detailed longitudinal record of young leaf production among 12 core tree species in the Kibale National Park, Uganda, spanning February 1999–December 2021. This dataset is tailored to address the drivers and ecological implications of leaf phenology in a mid-altitude, moist-evergreen African forest—particularly for interpretation of food availability for folivorous primates, such as the ashy red colobus.
Key organizational features:
- Study site and sampling: Observations were limited to Forestry Compartment K30, defined spatially by a specified polygon. All phenological census data (~30,103 tree-month records) were generated with a consistent monthly census protocol, save for a pause due to COVID-19.
- Focal taxa: Twelve tree species reflecting ~50% of the ashy red colobus diet, each contributing ≥1%.
- Phenological scoring: The variable of interest is a whole-crown “young leaf score” (YL, integer 0–6), assessed by a single, long-tenured observer per tree and per month. No explicit branch-level counts were collected.
- Environmental covariates: Monthly rainfall (manual gauge), cloud fraction (CLARA-A3), temperature (CRU TS v4.07), shortwave solar radiation (TerraClimate), atmospheric CO (NOAA GML), ENSO and Indian Ocean indices, and the MODIS-derived Enhanced Vegetation Index (EVI) are all included. Lagged metrics (1, 3, 6 month) are systematically derived.
- Data structure: All variables are aggregated at the calendar-month level, standardized for use in hierarchical generalized additive mixed models (GAMMs).
- Analytical models: Canonical community- and species-level models estimate young leaf probability using splines for seasonality and trend, plus species- and tree-level enrichment terms. Long-term dynamics are attributed to atmospheric CO and solar radiation, rather than rainfall or EVI.
- Data and metadata standards: Recommendations include date in ISO 8601, unique TreeID for random effects, and standardized climate/EVI tables. Gaps (e.g., April–June 2020) are explicitly flagged.
Scientific findings and significance:
Seasonal peaks in young leaf flush are consistently bimodal (April, October), primarily predicted by maximum temperature, rainfall, and cloud cover. Long-term trends, including a prominent S-shaped trajectory in community-level productivity, align most with atmospheric CO (direct, positive) and six-month mean solar radiation for subsets of species. EVI captures intra-annual but not inter-annual variability, with Pearson for raw monthly data and for seasonal splines. The dataset supports advanced modeling of climate-phenology linkages in African tropical forests and is a reference for folivore ecology and climate impact research.
2. MPDD-Young in Multimodal Depression Detection
The MPDD-Young Dataset as defined in the First MPDD Challenge (Fu et al., 15 May 2025) is a multimodal, personality-aware resource structured to advance depression detection in young adults. It consists of time-windowed recordings and metadata from a non-clinical cohort (age 18–30) balanced in gender and spanning multiple Chinese provinces.
Dataset composition and protocols:
- Participant pool: Approximately 40–60 college-aged adults, exclusion of diagnosed psychiatric or neurological disorders.
- Recording protocol: For each subject, acquisition involved three phases: mental health/personality questionnaires (PHQ-9, BigFive-10), spontaneous self-introduction, and two controlled reading tasks.
- Modalities and features:
- Audio: 16 kHz, 16-bit PCM; features comprise MFCCs (13 static + Δ + ΔΔ), OpenSMILE eGeMAPS descriptors, and 512-dim Wav2Vec embeddings.
- Video: 1080p, 30 fps; OpenFace (17 action units + pose), 2048-dim ResNet-50, and 1024-dim DenseNet-121 frame-level features.
- Temporal granularity: Features are computed in 1 s and 5 s non-overlapping sliding windows, yielding 528 samples (264 each for train/test).
- Individual differences: BigFive-10 personality scores are one-hot encoded per trait. Demographics (age, gender, region) are integrated into a personalized 1024-dim embedding via ChatGLM3 text prompts and RoBERTa-large embedding.
- Ground-truth labeling: The PHQ-9 score is mapped to binary (cutoff 4.5) and ternary (0–4 normal, 5–9 mild, 10–27 severe) depression labels, with balanced splits in both tasks.
- Data formats: File structure is explicit per subject and window (audio: .wav; video: .mp4; feature arrays: .npy; metadata: .csv). Identities are pseudonymized; no further de-identification is performed.
Baseline system and evaluation:
- Model architecture: Audio and visual streams are encoded with distinct 2-layer bidirectional LSTMs ( output each); fused via a Transformer encoder and concatenated with the personalized embedding; classified by a two-layer MLP with softmax.
- Training objective: Cross-entropy over classes.
- Performance metrics: Weighted/Unweighted accuracy and F1-score, reported for both temporal windows and class splits.
Representative results:
| Task | 1-s window Acc/F1 (Bin) | 5-s window Acc/F1 (Bin) | 1-s window Acc/F1 (Tern) | 5-s window Acc/F1 (Tern) |
|---|---|---|---|---|
| Weighted/Unweighted Acc | 63.64% | 62.12% | 49.66/51.52% | 41.71/50.00% |
| Weighted/Unweighted F1 | 59.96% | 62.11% | 51.86/51.62% | 48.18/41.31% |
A plausible implication is that the addition of personalized embeddings provides moderate but meaningful discriminative power for window-wise depression detection under both binary and ternary annotation regimes.
3. MPDD-Young Photometric Stellar Catalog
The MPDD-Young catalog as described by Balbinot et al. (2023) (Fernandes et al., 2023) is a uniform, Gaia/2MASS-based compendium of effective temperatures, luminosities, radii, and masses for 4,865 young (1 Gyr) stars across 31 nearby clusters and moving groups (≤200 pc).
Catalog assembly and methodology:
- Input population: Clusters include: Pleiades, Hyades, AB Doradus, Upper Scorpius, Persei, among others. Membership is established via BANYAN Σ and Gaia DR2/DR3 open cluster catalogs.
- Photometric processing:
- Initial photometry: Gaia DR3 , , , and 2MASS via best-neighbor cross-match.
- Exclusions: non-finite values, known binaries or non-single systems, and Gaia RUWE 1.4.
- Typical photometric uncertainties: mag in , 0.02 mag in .
- Monte Carlo error propagation: 1,000 draws per star, propagated through color– polynomials.
- Calibrations: Pecaut & Mamajek (2013) for effective temperature; 7th-order polynomial for PMS stars, 4th-order for dwarfs.
- Bolometric corrections: ; . Extinction computed using Bayestar19 or recalibrated SFD maps, depending on declination.
- Parameter inference:
- Stellar luminosities: .
- Radii: Stefan–Boltzmann law or Mann et al. (2015) – relation for faint dwarfs.
- Masses: Torres et al. (2010) for ; Mann et al. (2019) for lower-mass dwarfs; PMS masses via Bayesian interpolation on Feiden (2016) evolutionary tracks.
Validation and impact:
Photometric values agree with spectroscopic catalogs above 4,000 K (~K, ~K). For M dwarfs (below 4,000 K), a systematic offset is observed (photometrically cooler by 118~K). Using revised stellar radii, TESS short-period transiting planet survey completeness increases for sub-Neptunes/Neptunes by 1.5 for “MPDD-Young” stars (mean detection efficiency rising from 9\% to 10\%). The occurrence rate for all types and for FGK stars is \% and \% respectively, compared to prior estimates of 50\%.
Catalog content and data access:
Key columns are: TESS Input Catalog (TIC) ID, cluster/group, distance, , , (with uncertainty), (with two estimates for PMS stars). Machine-readable tables are available in ApJ online supplements and via VizieR.
4. Contextual Distinctions and Domain Relevance
While the designation “MPDD-Young” recurs, the underlying domains differ fundamentally: rainforest phenology/ecology (Lüthy et al., 13 Jan 2025), behavioral and affective computing (Fu et al., 15 May 2025), and exo/stellar astrophysics (Fernandes et al., 2023). Each dataset is foundational within its community:
- In phenology, MPDD-Young informs the dynamics of food resources and the response of tropical forest systems to climate forcing.
- In affective computing, it benchmarks personalized, multimodal inferences in affective state detection with explicit attention to demographic heterogeneity.
- In astrophysics, it provides a rigorously vetted population of young stars for exoplanet occurrence and evolutionary studies, explicitly calibrating the impact of host star parameters on survey sensitivity.
A plausible implication is that citation of “MPDD-Young” without domain disambiguation is inadequate for scholarly communication.
5. Data Access and Reuse Considerations
- Phenology: Data available on request from the original authors and via referenced climate repositories. Recommended use includes hierarchical modeling of tree-level and community-level phenology under climate covariates (Lüthy et al., 13 Jan 2025).
- Depression detection: Access is via request on the official challenge website, with controlled distribution of video and all feature/metadata files. Reuse is advised for multimodal classification or transfer learning studies, provided confidential handling of face/voice data (Fu et al., 15 May 2025).
- Photometric catalog: Tables are released online (ApJ supplement, VizieR), and directly support recalibration of exoplanet detection pipelines, as well as modeling of young stellar populations (Fernandes et al., 2023).
Misconceptions may arise if the term “MPDD-Young” is interpreted as referring to a single, cross-domain dataset; precise identification is required for accurate application.