WiFlow: Device-Free Crowd Counting Dataset
- WiFlow dataset is a device-free, WiFi CSI-based resource designed for privacy-preserving, domain-adaptive crowd counting in IoT sensing.
- It uses minimal hardware with ESP32 boards and Raspberry Pi to capture dense, event-resolved CSI data across lab and classroom settings.
- The dataset supports rigorous benchmarking for self-supervised pretraining, domain adaptation, and few-shot learning with strict event-level annotations.
WiFlow is a device-free crowd-counting dataset based on WiFi Channel State Information (CSI), specifically designed to address the challenges of cross-domain generalisation in privacy-preserving Internet of Things (IoT) sensing systems. The dataset provides dense, timestamped CSI captured under controlled, multi-person occupancy scenarios across distinct physical environments and occupancy regimes, with comprehensive event-level ground-truth annotations. Structured experimental protocols and pre-processing recommendations enable reproducible benchmarking of domain adaptation, self-supervised learning, and parameter-efficient transfer learning approaches using wireless signal data (Custance et al., 5 Jan 2026).
1. Data Collection Setup
WiFlow employs a minimal wireless hardware configuration to maximise reproducibility and deployment relevance:
- Hardware: Two ESP32-WROOM-32U boards, each with a single external antenna, function as transmitter and receiver. The receiver streams raw CSI to a Raspberry Pi using the CSI tool of Hernandez & Bulut (WoWMoM 2020). Transmission uses 802.11n OFDM at 20 MHz bandwidth, exposing 52 effective subcarriers per packet.
- Environments: Data are obtained in two representative settings:
- A 7 m × 7 m laboratory (concrete/drywall, office furniture)
- A 5.5 m × 9 m classroom (linoleum floor, whiteboard wall, rows of tables/chairs)
- In both cases, devices are mounted at 1.2 m height and separated by 4 m (line-of-sight).
- Acquisition Parameters: CSI is sampled at 100 Hz (one CSI frame per 10 ms). For each timestamp, the dataset records the per-subcarrier channel frequency response (CFR), from which amplitudes () and optionally phases () are extracted. The resulting data is a 52-dimensional real vector per time step, corresponding to a 1×1 MIMO (single-antenna) configuration.
2. Dataset Composition
WiFlow comprises scenario-rich, event-resolved CSI trace data:
- Sessions and Participants: Ten unique volunteers participate in controlled groups of 2, 5, and 9. Each group executes a sequence of timed “enter” and “exit” events cued by an audible beep every 10 s, establishing a single-occupant change per beep. A full enter+exit cycle spans about s for group size .
- Duration: Approximately 6 hours of raw CSI are recorded (≈1 h per room per group size).
- Ground Truth: Beep timestamps (UNIX time) are logged as the exclusive ground-truth source; each correlates exactly to a single entry or exit. After segmentation into 1 s windows, windows that fall completely within an event interval are labeled as “enter,” “exit,” or “no_event”; windows overlapping multiple events are discarded from supervised sets. This strict event purity improves annotation reliability.
3. Domain Variability and Evaluation Splits
WiFlow explicitly encodes multidimensional domain variation:
- Physical Domains: Laboratory (Lab) versus Classroom with distinct multipath characteristics.
- Occupancy-Size Domains: Recordings with 2, 5, or 9 people per session.
- Domain Combinations: These axes yield six principal domain transfer conditions (e.g., Lab-2 → Classroom-5).
Recommended Splits:
- In-Domain Supervised: Standard 70/10/20 split on labeled windows within a single room and occupancy.
- Cross-Domain (Zero-Shot): Train on all windows from one domain, test on another without adaptation.
- Few-Shot Adaptation: Following zero-shot, fine-tune on labeled target-domain windows (); remaining target windows reserved for evaluation.
4. Data Preprocessing and Augmentation
WiFlow standardises the following procedures:
- Denoising: Apply a 4th-order Butterworth low-pass filter with Hz to each subcarrier amplitude, corresponding to for a 100 Hz sample rate.
- Windowing: Segment CSI amplitude streams using 1 s sliding windows ( samples, 50% overlap, step ).
- Data Augmentation: For self-supervised pretraining, employ:
- Additive Gaussian noise: ,
- Multiplicative scaling: ,
- Permutation: split window into segments and randomly reorder
- Feature Extraction: Although all baseline results use only amplitude , phase extraction is also feasible.
5. Dataset Statistics and Organization
Statistical and structural aspects of WiFlow support rigorous experimental reproducibility:
| Metric/Field | Description | Typical Value |
|---|---|---|
| Sessions | Total duration across rooms/occupancies | ~6 hours |
| Raw Windows | Number per domain before event filtering (100 Hz, 50% overlap) | ~36,000 |
| Labeled Windows | After purity filtering for single event per window | ~24,000 evenly split |
| Occupancy Dist. | Time spent at each level (0...N) per group/domain | Uniform |
| Storage Format | Directory structure separates Lab/Classroom and occupancy group; raw files + per-window labels | See below |
Directory Structure:
1 2 3 4 5 6 7 8 |
WiFlow/ ├── Lab/ │ ├── group_2/ │ │ ├── raw/ (*.csi or *.mat) │ │ └── labels.csv (window_start, window_end, event_label) │ ├── group_5/ … │ └── group_9/ … └── Classroom/ (identical structure) |
timestamp_ms, csi_amp_1, ..., csi_amp_52 (optionally phases).
Access and Licensing: WiFlow is proprietary but can be provided to academic researchers upon request, governed by the IEEE LaTeX Project Public License (LPPL) v1.3.
6. Usage Protocols and Evaluation Practices
To ensure robust and comparable results using WiFlow data:
- Pre-training: Always incorporate Butterworth filtering and standardized segmentation. Self-supervised contrastive encoders should use jitter, scaling, and permutation augmentations on unlabeled data.
- Domain Adaptation: For efficient cross-domain transfer, freeze the self-supervised encoder and insert 1×1 convolutional Adapter modules in each residual block for fine-tuning. In few-shot (k-shot) settings, update only the adapters and a new classification head with labeled target windows.
- Occupancy Counting Pipeline: Classify 1 s windows as enter/exit/no_event, then process the resulting sequence via a simple state machine with the following logic:
EVENT_THRESHOLD= 5 consecutive detections required to confirm an eventCOOLDOWN_PERIOD= 10 consecutive no_event windows before re-arming- This event debouncing yields stable, real-time counts.
- Evaluation Metrics: Report per-window classification accuracy and weighted F1, as well as count errors (MAE, RMSE). To quantify cross-domain robustness, use the Generalisation Index () with signifying minimal loss under domain transfer; (for MAE) indicates the target is intrinsically easier.
7. Significance and Research Applications
WiFlow addresses critical barriers to real-world deployment of device-free CSI crowd counting, particularly domain shift and sample efficiency. It enables benchmarking of self-supervised pretraining, parameter-efficient adapter fine-tuning, domain adaptation, and robust occupancy estimation pipelines in varied physical and occupancy contexts. The dataset supports evaluation in zero-shot and few-shot transfer scenarios, with strict event-level annotation and standardized pre-processing, facilitating reproducibility. As demonstrated in (Custance et al., 5 Jan 2026), WiFlow is instrumental in establishing state-of-the-art results for domain-adaptive crowd counting, supporting both methodological innovation and deployment-readiness studies for robust IoT sensing systems.