BeesBook: Honey Bee Tracking System
- BeesBook is an open-source optical tracking system that continuously monitors honey bee trajectories using custom markers, computer vision, and machine learning.
- It employs a multi-step tracking pipeline—including CNN-based detection, SVM matching, and random-forest merging—to reduce error rates and improve tracking consistency.
- The system releases extensive, lifetime-resolved motion data that supports studies on collective behavior, social-network inference, and foraging ecology.
BeesBook is an open-source optical tracking system designed to capture the full-lifetime trajectories of all honey bee workers in a densely populated colony. Capitalizing on computer vision, machine learning, and custom physical markers, BeesBook enables continuous, high-resolution tracking of thousands of individually marked honey bees over weeks, yielding the first comprehensive trajectory dataset of entire colonies. The system is architected to address the critical challenge of motion path reconstruction in a visually crowded, occluded social insect environment, facilitating downstream analysis of collective behavior and individual life histories (Boenisch et al., 2018).
1. Hardware, Marker Design, and Recording Protocol
BeesBook employs four PointGrey Flea3 monochrome cameras (12 MP each), positioned to observe both sides of a single-frame observational hive. Each camera pair per comb side slightly overlaps to ensure full hive coverage and cross-camera calibration. Illumination uses arrayed infrared (IR) LEDs, remaining invisible to bees, thus not perturbing their behavior. The imaging subsystem records at 3 frames per second, applying on-the-fly 10-bit compression for efficient storage management. This setup enables continuous, noninvasive monitoring for extended durations.
Physical markers are 20 mm in diameter, curved to match the bee thorax, and withstand bee grooming or outdoor foraging. Each marker encodes a 12-bit binary ID arranged in an arc structure flanking two semicircular reference shapes used for orientation extraction. Notably, markers omit error-correcting bits; the system compensates for this constraint via a multi-step computational pipeline.
2. Multi-Step Tracking Pipeline
The BeesBook tracking algorithm addresses the high per-frame marker decoding error rate (≈13%) by leveraging temporal and spatial integration across frames through a two-stage tracking procedure:
(A) Detection and Decoding
- Preprocessing: Histogram equalization and image down-sampling optimize subsequent detection steps.
- Localization: A lightweight convolutional neural network (CNN) proposes candidate marker bounding boxes, yielding 98% recall and 99% precision.
- Decoding: For each candidate, a secondary CNN predicts a 12-dimensional bit-probability vector and marker orientation , treating each bit probabilistically (i.e., ) to enable robust aggregation in later stages.
(B) Tracking Step 1 — Tracklet Construction
Tracklet generation links sequential detections into short, contiguous sequences presumed to belong to the same bee.
- Candidate matching: For each open tracklet at , candidate detections at within a spatial radius px ( mm) are evaluated.
- Features: A feature vector comprises:
- (Manhattan distance)
Correspondence prediction: A linear-kernel SVM, scaled via Platt's method, yields .
- Assignment: The Hungarian algorithm is used to maximize the sum of correspondence probabilities. If , tracklets are terminated; unmatched detections start new ones.
(C) Tracking Step 2 — Tracklet Merging
To bridge gaps ( frames), BeesBook employs a random-forest classifier on candidate tracklet pairs, using features including:
- Median bit-probability difference
- End-to-start spatial distance
- Forward and backward extrapolation errors
- Initial/final orientation difference
- Confidence differential between tracklets
Assignments within gap windows are again performed via the Hungarian algorithm.
(D) ID Assignment
For each reconstructed track, bitwise medians across all constituent detections yield , binarized at 0.5:
3. Performance Evaluation
BeesBook achieves substantial accuracy gains over baseline per-frame decoding by integrating information temporally and spatially:
| Pipeline Stage | ID Error Rate | Complete Tracks |
|---|---|---|
| Raw CNN (single frame) | 13.3% | 10.2% |
| Tracklet building and median-ID (Step 1) | 3.9% | 26.5% |
| Tracklet merging and final median-ID (Step 2) | 1.9% | 70.4% |
| Lower bound ("perfect tracking" + median-ID) | 0.6% | 77.6% |
Deletions (missed detections) are sharply reduced from 32.2% (baseline) to 1.4% (step 1) and 2.4% (step 2). Insertions (false positives) remain below 1% at both stages. The system was benchmarked on 71 days of data—approximately 68 million frames, yielding 3.6 billion detections. Processing is parallelized by hour and merged by ID; a 100-core compute cluster processes ten weeks of data in under one week. Peak decoding throughput is achievable in real-time on consumer GPUs.
4. Dataset Characteristics and Accessibility
The tracking campaign successfully monitored approximately 2,000 bees concurrently (2,775 marked seasonally) over a 10-week interval, continuously sampled at 3 Hz. The full dataset comprises 68 million frames, with a three-day, 3 million-frame subset publicly available. Output data includes:
- Trajectory files (CSV/HDF5): (bee ID, timestamp, , , , tracklet ID)
- Coordinates in millimeter-scale, unified onto a common comb plane
- Bit-probability vectors for offline confidence analysis
All codebases (tracking, computer vision pipeline) and data subsets are available from public repositories and Zenodo archives (Boenisch et al., 2018).
5. Downstream Applications and Research Directions
Comprehensive, lifetime-resolved motion data enables a spectrum of studies:
- Behavioral analyses: Tracking all bees enables quantification of age-polyethism (transition from in-hive tasks to foraging), spatial learning, and social interaction topologies (e.g., trophallaxis, antennation).
- Social-network inference: Motion trajectories facilitate derivation of interaction events from proximity, supporting network-based investigations of information or pathogen flow.
- Foraging ecology: Lifetime path analysis enables correlation of individual experience with recruitment efficacy (e.g., success in waggle-dance acquisition and deployment).
- Open-source extensibility: The modular software stack and data formats invite extension, such as adopting color channels or error-correcting marker overlays, and integration of behavioral classifiers (e.g., automated grooming or waggle-dance detection pipelines).
- Community research: Dataset openness supports collective research on disease transmission, resource allocation, or distributed decision-making.
A plausible implication is that the BeesBook pipeline, by providing scalable, high-fidelity, long-term tracking without error-correcting marker codes, sets a benchmark for future high-throughput studies of collective animal behavior (Boenisch et al., 2018).
6. Limitations and Prospective Enhancements
Despite robust performance, BeesBook's decoding error rates approach a lower bound (0.6%) set by "perfect" tracking plus median aggregation. System accuracy is ultimately limited by marker visibility, detection recall/precision, and the absence of error-correction bits. Possible research directions include the introduction of new marker designs (e.g., incorporating color or tailored syntactic redundancy) and more sophisticated gap-bridging or long-gap trajectory inference methods. The published system encourages such advances by providing an extensible, open-source foundation and sharing all empirical datasets for benchmarking and comparative methodological development (Boenisch et al., 2018).