TACO Benchmark: Litter Detection

Updated 12 October 2025

TACO Benchmark is an open, high-resolution image dataset emphasizing contextual realism and fine-grained annotations for litter detection in varied, real-world environments.
It leverages crowdsourced annotations and a specialized data augmentation tool (Transplanter) to enhance model performance on occluded and ambiguous litter items.
Evaluation with Mask R-CNN on both classless (TACO-1) and multi-category (TACO-10) tasks demonstrates practical challenges such as small object size, scene complexity, and annotation ambiguity.

TACO (Trash Annotations in Context) is an open, high-resolution image dataset developed as a benchmark for litter detection and instance segmentation in diverse, unconstrained environments. Its guiding principles are contextual realism and fine-grained annotation, targeted at enabling robust deep learning-based detection in heterogeneous real-world scenes such as urban environments, ^{^{^{^{1^{^{^{^es,}}}}}}} and rivers. Unlike object datasets that focus on isolated, laboratory conditions or clean backgrounds, TACO’s images capture litter under occlusion, with ambiguous boundaries, and across a broad spectrum of lighting and background conditions. The dataset is intended as a continually expanding resource, enriched via crowdsourced manual annotations and supported by purpose-built augmentation tools.

1. Dataset Composition and Annotation Structure

The TACO dataset currently includes approximately 1,500 images and 4,784 litter annotations. Each image is annotated using three main mechanisms:

Scene tags: Provide environmental context (e.g., "urban", "beach", "river") for each image, supporting context-aware modeling.
Instance segmentation masks: Every litter item within an image is annotated with a pixel-level segmentation mask, enabling precise delineation for training instance segmentation models.
Hierarchical taxonomy: Annotations are categorized into 60 fine-grained litter types grouped into 28 super-categories. This taxonomy accounts for visual ambiguity between classes (e.g., plastic vs. glass bottle) and includes an "Unlabeled litter" category to address uncertain cases.

To facilitate comparative studies and mitigate class imbalance, two streamlined variants are provided:

Subset	Task Type	Classes	Special Handling
TACO-1	Classless detection	1	All litter merged, presence only
TACO-10	Detection & classification	10 super-categories	Rare classes merged into "Other Litter"

Such stratification enables metric-driven experiments and allows systematic investigation of detection versus classification errors under class imbalance and ambiguity.

2. Crowdsourcing Tools and Data Augmentation Framework

To scale both the dataset volume and diversity, TACO leverages two principal tools:

Online annotation platform: Accessible at http://tacodataset.org/annotate, this web-based interface permits contributors to label and segment litter objects. The crowdsourcing model enables continual growth of the dataset, diversification of scenes, and faster refinement of annotation quality.
Transplanter tool: A GUI system for data augmentation, enabling users to "transplant" segmented litter objects from one image to another. It employs a truncated distance transform to compute a soft mask for object blending, thereby mitigating edge artifacts typical of naïve copy-paste operations. The resulting composite images preserve both geometric and photometric realism, especially crucial for augmentation of rare backgrounds (e.g., rivers, beaches with crocodiles). The tool also acknowledges limitations with certain object types (notably transparent litter), prompting need for further refinements.

3. Instance Segmentation Benchmarking and Scoring Formulations

Segmentation performance on TACO is quantified using Mask R-CNN (ResNet-50 backbone + Feature Pyramid Network), trained with images resized to 1024x1024 for standardization. Two primary tasks are evaluated: classless (TACO-1) and 10-class (TACO-10) detection.

Evaluation is based on Average Precision (AP), computed across multiple thresholded Intersection-over-Union (IoU) values. Prediction confidence calibration is critical, with three alternative scores compared, derived from Mask R-CNN classification head logits:

$\text{Score} = \begin{cases} \max_{i} p_{i} & \text{(class\_score)}\ 1 - p_{N+1} & \text{(litter\_score)}\ \frac{\max_{i} p_{i}}{p_{N+1} + \epsilon} & \text{(ratio\_score)} \end{cases}$

where $p_i$ denotes the class probabilities, $p_{N+1}$ the background probability, and $\epsilon$ a small constant.

Empirical results demonstrate that the "ratio_score" consistently provides better AP in both TACO-1 (26.1 AP) and TACO-10 (19.4 AP) settings. Scatter plots of IoU versus confidence and confusion matrices elucidate systematic sources of prediction errors, particularly for small objects and overlapping instances.

Training and validation follow a 4-fold cross-validation protocol (80% train, 10% validation, 10% test) to ensure statistical robustness in performance quantification.

4. Challenges in Contextual Litter Detection

Multiple factors complicate litter detection in TACO:

Scene complexity: Frequent occlusions, fragmentation, and background camouflage yield ambiguous boundaries, reflected in suppressed IoU for small and thin objects.
Object scale disparity: Many litter items (e.g., cigarettes) occupy less than $64\times64$ pixels in scaled images, aggravating instance segmentation failure modes.
Annotation ambiguity: Visual overlap between classes and imperfect segmentation (particularly for transparent items) contribute to both false positives and false negatives. Data distribution plots and IoU-area scatter plots demonstrate the effect of object sizing on segmentation performance.
Environmental heterogeneity: Uncommon scenes (e.g., ocean waves, exotic rivers) present underrepresented backgrounds, challenging generalization in real-world deployments.

These difficulties motivate the refinement of annotation protocols and augmentation pipelines, as well as new evaluation metrics that can account for contextual complexity.

5. Role and Impact of Crowdsourced Manual Annotations

The coverage, representational diversity, and overall robustness of TACO remain limited by the current size and annotation density. Manual annotation via both the online platform and transplanter tool is fundamental for:

Expanding dataset diversity: Enabling inclusion of rare backgrounds and challenging object configurations.
Correcting and balancing classes: Increasing the representation of ambiguous or underrepresented classes, refining boundaries, and adjusting for imbalanced frequencies.
Improving detection and segmentation performance: More granular and numerous annotations translate to better training samples for models, especially for tiny, overlapping, or hard-to-classify objects.

Future robustness and deployment readiness of litter detection systems built on TACO depends critically on sustained, detailed, and context-aware manual contributions.

6. Research Significance and Future Directions

TACO represents an innovative and evolving benchmark for contextual litter detection and segmentation. Its strengths are:

Rich, hierarchical taxonomic annotation supporting fine-grained, context-aware recognition tasks.
Purpose-built, crowdsourced and augmentation frameworks for scaling annotation diversity.
Realistic scene complexity exposing limitations of current deep segmentation architectures.

Outstanding research questions include:

How to best design data augmentation and soft-masking techniques for transparent or severely occluded litter objects.
Methods for mitigating loss of small-object detail under compulsory global resizing.
Statistical approaches for balancing annotation effort against incremental gains in AP, with class and scene frequency taken into account.

The dataset’s continuous expansion—via community annotation and technical refinements—underpins its value as a real-world benchmark, supporting development and deployment of robust, context-sensitive litter detection models in unconstrained environments.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to TACO Benchmark.