Elements Dataset Overview

Updated 21 December 2025

Elements Dataset is a collection of annotated data resources that decompose complex structures into basic units for research applications.
These datasets employ detailed annotation schemes and benchmarks across domains like web interfaces, document layouts, and video perception to support robust evaluations.
They drive methodological advances by standardizing definitions, protocols, and task-specific metrics across varied disciplines.

The term "Elements Dataset" encompasses a wide variety of datasets across domains, each centered on the explicit representation, annotation, or exploitation of atomic components ("elements") within a structured context—visual, textual, graphical, spatial, or material. This article reviews the most prominent datasets and benchmarks termed "Elements Dataset" or employing "elements" as primary data units in perception, language, vision, document layout, UI analysis, scientific representations, and privacy-preserving statistics. These resources provide standardized corpora, formal task definitions, explicit annotation schemas, and public implementations, serving as the foundation for methodological advances in a spectrum of academic fields.

1. Definitions and Scope: What Constitutes an "Elements Dataset"?

An Elements Dataset is characterized by the explicit labeling, structuring, or modeling of "elements," which represent basic units—objects, entity mentions, UI widgets, graphical regions, atomic visual components, narrative events, chemical elements, or textual/graphical triples—within a complex environment. The term is domain-agnostic but consistently implies that each data example is decomposed or annotated in terms of its constituent elements, often with associated properties, bounding boxes, groupings, or semantic relations. Key exemplars include:

Web page or GUI elements: HTML DOM nodes, visual groups, or UI widgets (Wichers et al., 2018, Pasupat et al., 2018, Li et al., 2020).
Visual elements in presentation layouts: Geometric primitives, text boxes, images, charts, lines, and hierarchical groupings (Shi et al., 2022).
Document elements (historical, graphical): Non-textual axes-aligned bounding boxes, categorized into fine-grained classes (e.g., chart, frieze, photograph) (Kišš et al., 28 Mar 2025).
Atomic visual elements in video: Objects (road users, actors) densely annotated per frame with class, actions, and bounding boxes (Wang et al., 2024).
Graph and network elements: Space-function or space-access nodes and edges in building layouts (Ziaee et al., 2023).
Textual/narrative elements: Labeled sentences as Complication, Resolution, or Success in news stories (Levi et al., 2020); relational graph triples in summarization (Hoeve et al., 2022).
Scientific/materials elements: Vector embeddings of chemical elements in high-dimensional space for materials informatics (Onwuli et al., 2023).
Statistical elements under privacy: Unique element counting functions in databases, under person-level differential privacy (Knop et al., 2023).

2. Major Datasets: Composition, Representation, and Domains

Table: Summary of Representative Elements Datasets

Dataset Name / ref	Domain	Element Type	Scale/Annotation
"Mapping Natural Language Commands..." (Pasupat et al., 2018)	Web interaction	HTML leaf nodes (widgets)	1,835 pages, 51,663 (command, element) pairs
"Resolving Referring Expressions..." (Wichers et al., 2018)	Vision, multimodal	OCR/GUI elements, COCO objects	~104k webpage pairs, ~120k COCO expressions
Widget Captioning (Li et al., 2020)	UI/Accessibility	Android UI widgets	21,750 screens, 61,285 elements, 162,859 captions
REIP/Elements (Shi et al., 2022)	Layout, Vision	Visual slide elements	23,072 slides, 620,878 elements, 1,000+ with grouping
AnnoPage (Kišš et al., 28 Mar 2025)	Document Analysis	25-class non-textual regions	7,550 pages, 27,904 elements
DAVE (Wang et al., 2024)	Video Perception	Traffic actors in video	1,231 clips, 13M actor boxes, 1.6M with actions
SAGC-A68 (Ziaee et al., 2023)	Built Environments	Space & element nodes (graphs)	68 graphs, 4,871 nodes (28 classes)
CompRes (Levi et al., 2020)	NLP, Narratives	Narrative event types	29 articles, 1,099 sentences, 3-multilabel/instance
GraphelSums (Hoeve et al., 2022)	NLP, Summarization	Relation-labeled graph triples	286 news articles, ≈2k relational triples
Elements Embeddings (Onwuli et al., 2023)	Materials Science	Chemical element vectors	118 elements × up to 200D embeddings
Distinct Counting (Knop et al., 2023)	Privacy, Statistics	Distinct element sets	Theoretical/method dataset

The table reflects the cross-disciplinarity of the Elements Dataset paradigm, where the formalization of "element" is strictly tailored to the research domain's ontologies and tasks.

3. Annotation Schemes, Quality Control, and Data Formats

Annotation protocols are designed to provide both unambiguous structural data and task-specific ground truth:

Web and UI elements: Annotated with attributes (text, HTML tag), geometric coordinates, and often a reference utterance (command, caption). Data formats are JSON, with unique IDs, attributes, bounding boxes (normalized to [0,1]), and masks if pixel segmentation is needed (Wichers et al., 2018, Pasupat et al., 2018, Li et al., 2020).
Visual & Layout elements: Each entity (geometry, image, text, table) on a rendering canvas or slide is stored with spatial, typological, and visual features; grouping hierarchies are annotated using rooted trees, adjacency matrices, or parent-child lists (Shi et al., 2022).
Historical document elements: Professional librarians use standardized methodologies to assign each non-textual visual element to one of 25 categories, encoding position and extents in YOLO format (class_id center_x center_y w h, normalized) (Kišš et al., 28 Mar 2025).
Video atomic elements: Each object annotated per frame with actor class, bounding box, identity track, and possibly an action label; privacy is addressed with automatic face/license blurring (Wang et al., 2024).
Graph/network datasets: Nodes and edges structured with geometric, semantic, and graph-theoretic features; labels cover function and type classes; data stored as JSON or NumPy matrices (Ziaee et al., 2023).
Narrative/event datasets: Sentences or relational pairs are labeled with multiple possible "element" tags; annotation is performed by multi-pass expert consensus, with agreement (Cohen's κ) tracked (Levi et al., 2020, Hoeve et al., 2022).
Materials element embeddings: Each element mapped to a fixed-dimensional real vector; all embeddings and similarities are accessible as NumPy arrays via the ElementEmbeddings Python package (Onwuli et al., 2023).
Differential privacy counting: Set-valued user contributions; ground truth is the maximum bipartite matching or flow; no public raw dataset, but the theoretical pipeline is formalized in LaTeX (Knop et al., 2023).

Robustness is ensured by expert annotation (AnnoPage, REIP), redundancy and adjudication (CompRes, Widget Captioning), or cross-validation and statistical consistency checks (AnnoPage, REIP).

4. Benchmark Tasks, Protocols, and Evaluation Metrics

Distinct Elements Datasets define or extend the state-of-the-art by formalizing domain-appropriate supervised tasks and associated metrics:

UI/Web grounding and segmentation: Identify or segment the region corresponding to a natural-language expression, command, or caption. Standard metrics: Intersection over Union (IoU), pixel precision, exact-match accuracy (Wichers et al., 2018, Pasupat et al., 2018, Li et al., 2020).
Hierarchical grouping and structure recovery: Recover layout or information-presentation groupings; pairwise relatedness and tree edit distance are principal metrics (Shi et al., 2022).
Document and image object detection: Mean Average Precision (mAP) at multiple IoU thresholds, as in COCO; experiments benchmark YOLO and DETR variants (Kišš et al., 28 Mar 2025).
Video tracking, detection, action localization: AO, SR, mAP over actor/action classes, [email protected] for video moment retrieval; all supporting tasks explicitly defined with formulas (Wang et al., 2024).
Graph node/element classification: Per-class precision, recall, F1; t-SNE or confusion matrices for qualitative analysis (Ziaee et al., 2023).
Event element identification: Multi-label sentence classification (SVM, RoBERTa); metrics: per-label precision, recall, and F1 (Levi et al., 2020).
Graphical summarization: Overlap, precision, and F1 across annotator triples (hard, soft); Jaccard similarity for relation label sets (Hoeve et al., 2022).
Elemental material similarity: Structure-type prediction accuracy as a function of cosine similarity between embeddings; performance baselined against domain heuristics (radius ratio rules) (Onwuli et al., 2023).
Count query privacy: Theoretical approximation guarantees (Laplace noise, bias-corrected lower bounds) under $\varepsilon$ -differential privacy; no empirical task but welldefined error bounds and mechanisms (Knop et al., 2023).

This diversity reflects the requirement to adapt the evaluation regime to the semantics of "elements," underlying tasks (e.g., segmentation, retrieval, summarization, classification), and domain-specific constraints.

5. Public Code, Data, and Reproducibility

Most prominent Elements Datasets provide full pipeline resources to facilitate rigorous evaluation and further development:

Widget Captioning: [https://github.com/google-research-datasets/WidgetCaptioning], data/scripts release accompanying the paper (Li et al., 2020).
REIP/Elements: [https://github.com/sdq/reip], all slide JSON, grouping ground-truth, toolkit in Python (Shi et al., 2022).
SAGC-A68: [https://github.com/A2Amir/SAGC-A68], in parallel with Zenodo data hosting (Ziaee et al., 2023).
AnnoPage: [https://doi.org/10.5281/zenodo.12788419], self-contained test split, YOLO annotations (Kišš et al., 28 Mar 2025).
CompRes: All annotation and model scripts accompany the release (Levi et al., 2020).
GraphelSums: Code to regenerate full splits and scoring procedures (Hoeve et al., 2022).
ElementEmbeddings: pip-installable Python package (ElementEmbeddings), Jupyter demos, covers all vector families and similarity measures (Onwuli et al., 2023).
WebRef/GUI segmentation: Code for end-to-end referring expression segmentation at [http://www.rebrand.ly/refExpCode], dataset release planned (Wichers et al., 2018).

The release of data in structured, scriptable formats and the open implementation of baselines are critical drivers for the adoption and impact of these resources.

6. Limitations, Open Challenges, and Prospective Extensions

Elements Datasets are fundamentally shaped by their annotation scope, domain coverage, and granularity:

Representation granularity: Certain datasets restrict masks to bounding boxes, lack sub-element (pixel- or polygon-level) detail (Wichers et al., 2018, Kišš et al., 28 Mar 2025).
Domain and language bias: Many are centered on English (web pages, news, summarization) or specific regional geographies (DAVE’s India traffic) with implications for generalization (Pasupat et al., 2018, Wang et al., 2024).
Label ambiguity and scarcity: Fine-grained categories (rare room types, document images) suffer from class imbalance and low instance counts, affecting F1 and decision boundaries (Kišš et al., 28 Mar 2025, Ziaee et al., 2023).
2D vs. 3D: Most visual and spatial datasets lack 3D annotations or multimodal sensor support; future extensions propose LiDAR, radar, and instance 3D pose (Wang et al., 2024).
Annotation agreement: Multi-annotator agreement (e.g., F1=0.21 for graphical summarization) remains modest where subjective decisions dominate (relation assignment, summary selection) (Hoeve et al., 2022).
Extensibility to new modalities: For privacy, the challenge lies in extending differential privacy guarantees to more complex query types; for vision, incorporating richer metadata and multi-sensor fusion (Knop et al., 2023).

Suggested research trajectories include expanded annotation (polygonal masks, clause-level narrative labeling), broader domain sampling, integration of additional semantic layers (hierarchical/nested grouping), and comprehensive inter-annotator consistency studies.

7. Significance and Applications Across Disciplines

Elements Datasets operationalize "elements" as data primitives in a way that systematically advances:

Interactive systems: Voice-controlled navigation, accessible UIs, intelligent agents that ground to atomic visual or web elements (Pasupat et al., 2018, Li et al., 2020).
Perception in complex environments: Robust tracking, detection, and action understanding for autonomous vehicles in unconstrained real-world settings (Wang et al., 2024).
Information design and retrieval: Automated recovery of layout structure, grouping, and design assessment in documents, presentations, and historical archives (Shi et al., 2022, Kišš et al., 28 Mar 2025).
Materials discovery and property prediction: Quantitative comparison and clustering of chemical elements to facilitate predictive materials informatics (Onwuli et al., 2023).
Narrative and content analysis: Decomposition of stories and summaries into semantically-labeled elements, supporting computational journalism and document summarization (Levi et al., 2020, Hoeve et al., 2022).
Differential privacy: Rigorous release of distinct-count statistics with formal privacy guarantees in database analysis (Knop et al., 2023).
Architectural computing: Automated interpretation and classification of built environments for energy, safety, and usability analysis (Ziaee et al., 2023).

Collectively, Elements Datasets exemplify the cross-cutting methodological infrastructure required to standardize, automate, and validate models operating at the "atomic" level in structured real and virtual worlds.

Markdown Upgrade to Chat

References (11)

Resolving Referring Expressions in Images With Labeled Elements (2018)

Mapping Natural Language Commands to Web Elements (2018)

Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements (2020)

Reverse-Engineering Information Presentations: Recovering Hierarchical Grouping from Layouts of Visual Elements (2022)

AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization (2025)

DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments (2024)

SAGC-A68: a space access graph dataset for the classification of spaces and space elements in apartment buildings (2023)

CompRes: A Dataset for Narrative Structure in News (2020)

Summarization with Graphical Elements (2022)

10.

Element similarity in high-dimensional materials representations (2023)

11.

Counting Distinct Elements Under Person-Level Differential Privacy (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Elements Dataset.