Oven Cooking Progression Dataset
- The dataset is a chef-annotated benchmark capturing non-linear food transformations in ovens, detailing doneness states and temporal annotations.
- It supports tasks like generating high-fidelity cooked food images at varying doneness levels and real-time monitoring using a specialized Culinary Image Similarity metric.
- Structured from controlled oven trials across diverse recipes, it offers a robust framework for supervised training and evaluation in automated culinary systems.
The oven-based cooking-progression dataset is a large-scale, chef-annotated benchmark explicitly designed to capture the continuous evolution of food appearance under oven-induced heat. Unlike previous datasets that focus predominantly on stovetop activity sequences or static food recognition, this resource uniquely models the non-linear textural, color, and structural changes—such as browning, caramelization, and dough expansion—that characterize progressive doneness in oven cooking. The dataset enables two primary tasks: (1) cooked food image synthesis, where models must generate plausible visual representations of food at varying doneness levels conditioned on raw appearance and recipe context, and (2) visual cooking progress monitoring, in which autonomous systems determine optimal stopping points by matching real-time camera observations to synthesized target states (Gupta et al., 21 Nov 2025).
1. Scope and Objectives
The primary aim of the oven-based cooking-progression dataset is to provide a benchmark for temporal modeling of food transformation during oven cooking. It addresses a notable deficit in extant computer vision resources, such as YouCook2 and COM Kitchens, which emphasize multi-stage stovetop manipulation, and ISIA Food-500, which catalogues food identity but lacks dynamic, within-item doneness annotation. This dataset explicitly targets fine-grained progression under fixed heat in an oven environment, facilitating modeling of physical-chemical reactions—browning, crisping, caramelization, rising, and shrinkage—that define culinary doneness. Research tasks derivable from this resource include (a) generation of cooked food images at discrete doneness levels ("basic," "standard," "extended"), and (b) real-time visual progress monitoring to automate optimal cook-stop recommendations (Gupta et al., 21 Nov 2025).
2. Data Acquisition Process
Data were acquired in controlled oven sessions spanning 1,708 independent trials across 30 diverse recipes, encompassing bakery products, meats, vegetables, and processed items (e.g., SalmonSteak, ChocoChipCookie, ButtermilkBiscuits, FrozenPizza, Tortillas, ChickenDrumsticks, FruitPie, Muffins). In each session, a single food item was cooked in "auto" oven mode to ensure uniform temperature and temporal progression, with no door openings allowed throughout the cook. RGB imagery was captured at 30s intervals via a stationary, top-mounted oven camera utilizing solely internal illumination to ensure consistent lighting conditions. Frames were downsampled and center-cropped to 224×224 pixels to standardize model input. Each session contains a minimum of three labeled images: "raw" at time , with subsequent "basic" (cs₁), "standard" (cs₂), and optionally "extended" (cs₃) states manually annotated by chefs according to recipe-dependent doneness transitions. Most experiments standardize to three states per recipe for uniformity. Image pairs of the form [raw, basic], [raw, standard], [raw, extended] represent unambiguous progressions suited for supervised training and evaluation (Gupta et al., 21 Nov 2025).
3. Annotation Methodology and Temporal Progress Quantification
Chef annotation protocol ensures each sequence is discretized into well-defined doneness states cs₁–cs₃, supporting both categorical and temporal reasoning. Every frame is stamped with a timestamp (in seconds), facilitating the computation of a continuous, session-relative cooking progress score: where is the total session duration. This normalized scalar quantifies the closeness in cooking time between any two frames within a session. Session metadata further includes oven auto-mode settings, target recipe label, and cumulative session duration, all encoded in per-session JSON files (Gupta et al., 21 Nov 2025).
4. Dataset Organization and Structure
The dataset root, "OvenProgression/", is sub-divided into train, validation, and test splits by session count (1,196/171/341 sessions; corresponding to approximately 3,588/513/715 image pairs, respectively). Each recipe's directory contains session-wise subfolders, inside which raw and cooked state images—formatted as "session_<4-digit ID>_<state>.jpg" with state {raw, basic, standard, extended}—are stored alongside a metadata.json file. Metadata includes timestamps for each doneness state, oven mode, recipe label, and session duration:
1 2 3 4 5 6 |
{
"timestamp": { "raw": 0, "basic": t1, "standard": t2, "extended": t3 },
"oven_mode": "auto",
"recipe_label": "<RecipeName>",
"session_duration": T_session
} |
| Split | Sessions | Raw–Cooked Pairs (approx.) |
|---|---|---|
| train | 1,196 | 3,588 |
| val | 171 | 513 |
| test | 341 | 715 |
5. Culinary Image Similarity (CIS) Metric
The dataset introduces the domain-specific Culinary Image Similarity (CIS) metric for both model training and run-time progress monitoring. For images and from the same session: Here, is a feature extractor based on an EfficientNet-B1 backbone (up to final pooling) with a 2048→2048→128 projection MLP, producing L₂-normalized embeddings; is cosine similarity. is trained using an MSE loss between CIS scores and the normalized relative cooking progress , incorporating ±60° rotation and random flip augmentations. CIS is leveraged as a generator loss () and as an on-line stopping signal, where progress at time is defined as , with cooking halted when this signal peaks within a sliding window. This metric operationalizes fine-grained, visually and temporally informed model evaluation and deployment (Gupta et al., 21 Nov 2025).
6. Baseline Performance and Accessibility
On this benchmark, the proposed model demonstrates a Fréchet Inception Distance (FID) of 52.18 versus 153.00 for Pix2Pix and 75.42 for Pix2Pix-Turbo; perceptual similarity via LPIPS is 0.2145 (vs. 0.4711 for Pix2Pix and 0.2523 for Turbo). These figures indicate substantially improved visual fidelity and semantic relevance in generated outputs. The dataset is slated for public release upon corresponding publication, hosted via GitHub and the CVPR Datasets & Benchmarks portal; early access requests are accepted by direct correspondence with the authors (Gupta et al., 21 Nov 2025).
7. Relation to Previous Datasets and Future Implications
Existing datasets—YouCook2, COM Kitchens, and ISIA Food-500—do not explicitly capture intra-item temporal progression in oven cooking, instead focusing on episode-level action segmentation or single-image food recognition. The oven-based cooking-progression dataset fills a critical gap by modeling the complex, multistage appearance changes unique to oven environments. A plausible implication is that this dataset could catalyze advances in edge-based visual monitoring, generative modeling of food, and autonomous appliance control. Furthermore, the CIS metric may prove extensible for temporal progression modeling in other thermal or process-driven visual domains (Gupta et al., 21 Nov 2025).