WiFlow: Device-Free Crowd Counting Dataset

Updated 12 January 2026

WiFlow dataset is a device-free, WiFi CSI-based resource designed for privacy-preserving, domain-adaptive crowd counting in IoT sensing.
It uses minimal hardware with ESP32 boards and Raspberry Pi to capture dense, event-resolved CSI data across lab and classroom settings.
The dataset supports rigorous benchmarking for self-supervised pretraining, domain adaptation, and few-shot learning with strict event-level annotations.

WiFlow is a device-free crowd-counting dataset based on WiFi Channel State Information (CSI), specifically designed to address the challenges of cross-domain generalisation in privacy-preserving Internet of Things (IoT) sensing systems. The dataset provides dense, timestamped CSI captured under controlled, multi-person occupancy scenarios across distinct physical environments and occupancy regimes, with comprehensive event-level ground-truth annotations. Structured experimental protocols and pre-processing recommendations enable reproducible benchmarking of domain adaptation, self-supervised learning, and parameter-efficient transfer learning approaches using wireless signal data (Custance et al., 5 Jan 2026).

1. Data Collection Setup

WiFlow employs a minimal wireless hardware configuration to maximise reproducibility and deployment relevance:

Hardware: Two ESP32-WROOM-32U boards, each with a single external antenna, function as transmitter and receiver. The receiver streams raw CSI to a Raspberry Pi using the CSI tool of Hernandez & Bulut (WoWMoM 2020). Transmission uses 802.11n OFDM at 20 MHz bandwidth, exposing 52 effective subcarriers per packet.
Environments: Data are obtained in two representative settings:
- A 7 m × 7 m laboratory (concrete/drywall, office furniture)
- A 5.5 m × 9 m classroom (linoleum floor, whiteboard wall, rows of tables/chairs)
- In both cases, devices are mounted at 1.2 m height and separated by 4 m (line-of-sight).
Acquisition Parameters: CSI is sampled at 100 Hz (one CSI frame per 10 ms). For each timestamp, the dataset records the per-subcarrier channel frequency response (CFR), from which amplitudes ( $|h_k|$ ) and optionally phases ( $\angle h_k$ ) are extracted. The resulting data is a 52-dimensional real vector per time step, corresponding to a 1×1 MIMO (single-antenna) configuration.

2. Dataset Composition

WiFlow comprises scenario-rich, event-resolved CSI trace data:

Sessions and Participants: Ten unique volunteers participate in controlled groups of 2, 5, and 9. Each group executes a sequence of timed “enter” and “exit” events cued by an audible beep every 10 s, establishing a single-occupant change per beep. A full enter+exit cycle spans about $20 \times N$  s for group size $N$ .
Duration: Approximately 6 hours of raw CSI are recorded (≈1 h per room per group size).
Ground Truth: Beep timestamps (UNIX time) are logged as the exclusive ground-truth source; each correlates exactly to a single entry or exit. After segmentation into 1 s windows, windows that fall completely within an event interval are labeled as “enter,” “exit,” or “no_event”; windows overlapping multiple events are discarded from supervised sets. This strict event purity improves annotation reliability.

3. Domain Variability and Evaluation Splits

WiFlow explicitly encodes multidimensional domain variation:

Physical Domains: Laboratory (Lab) versus Classroom with distinct multipath characteristics.
Occupancy-Size Domains: Recordings with 2, 5, or 9 people per session.
Domain Combinations: These axes yield six principal domain transfer conditions (e.g., Lab-2 → Classroom-5).

Recommended Splits:

In-Domain Supervised: Standard 70/10/20 split on labeled windows within a single room and occupancy.
Cross-Domain (Zero-Shot): Train on all windows from one domain, test on another without adaptation.
Few-Shot Adaptation: Following zero-shot, fine-tune on $k$ labeled target-domain windows ( $k=1,5,10$ ); remaining target windows reserved for evaluation.

4. Data Preprocessing and Augmentation

WiFlow standardises the following procedures:

Denoising: Apply a 4th-order Butterworth low-pass filter with $f_c=8$  Hz to each subcarrier amplitude, corresponding to $W_n=0.16$ for a 100 Hz sample rate.
Windowing: Segment CSI amplitude streams using 1 s sliding windows ( $W=100$ samples, 50% overlap, step $S=50$ ).
Data Augmentation: For self-supervised pretraining, employ:
- Additive Gaussian noise: $\epsilon \sim \mathcal{N}(0, \sigma_j^2)$ , $\sigma_j=0.03$
- Multiplicative scaling: $\alpha \sim \mathcal{N}(1.0, \sigma_s^2)$ , $\sigma_s=0.1$
- Permutation: split window into $k\in\{2,\ldots,5\}$ segments and randomly reorder
Feature Extraction: Although all baseline results use only amplitude $A_k=|h_k|$ , phase $\varphi_k=\angle h_k$ extraction is also feasible.

5. Dataset Statistics and Organization

Statistical and structural aspects of WiFlow support rigorous experimental reproducibility:

Metric/Field	Description	Typical Value
Sessions	Total duration across rooms/occupancies	~6 hours
Raw Windows	Number per domain before event filtering (100 Hz, 50% overlap)	~36,000
Labeled Windows	After purity filtering for single event per window	~24,000 evenly split
Occupancy Dist.	Time spent at each level (0...N) per group/domain	Uniform
Storage Format	Directory structure separates Lab/Classroom and occupancy group; raw files + per-window labels	See below

Directory Structure:

WiFlow/
├── Lab/
│   ├── group_2/
│   │    ├── raw/ (*.csi or *.mat)
│   │    └── labels.csv (window_start, window_end, event_label)
│   ├── group_5/ …
│   └── group_9/ …
└── Classroom/ (identical structure)

Raw file format: timestamp_ms, csi_amp_1, ..., csi_amp_52 (optionally phases).

Access and Licensing: WiFlow is proprietary but can be provided to academic researchers upon request, governed by the IEEE LaTeX Project Public License (LPPL) v1.3.

6. Usage Protocols and Evaluation Practices

To ensure robust and comparable results using WiFlow data:

Pre-training: Always incorporate Butterworth filtering and standardized segmentation. Self-supervised contrastive encoders should use jitter, scaling, and permutation augmentations on unlabeled data.
Domain Adaptation: For efficient cross-domain transfer, freeze the self-supervised encoder and insert 1×1 convolutional Adapter modules in each residual block for fine-tuning. In few-shot (k-shot) settings, update only the adapters and a new classification head with $k$ labeled target windows.
Occupancy Counting Pipeline: Classify 1 s windows as enter/exit/no_event, then process the resulting sequence via a simple state machine with the following logic:
- EVENT_THRESHOLD = 5 consecutive detections required to confirm an event
- COOLDOWN_PERIOD = 10 consecutive no_event windows before re-arming
- This event debouncing yields stable, real-time counts.
Evaluation Metrics: Report per-window classification accuracy and weighted F1, as well as count errors (MAE, RMSE). To quantify cross-domain robustness, use the Generalisation Index ( $GI = {\rm target\_performance}/{\rm source\_performance}$ ) with $GI \approx 1$ signifying minimal loss under domain transfer; $GI \gg 1$ (for MAE) indicates the target is intrinsically easier.

7. Significance and Research Applications

WiFlow addresses critical barriers to real-world deployment of device-free CSI crowd counting, particularly domain shift and sample efficiency. It enables benchmarking of self-supervised pretraining, parameter-efficient adapter fine-tuning, domain adaptation, and robust occupancy estimation pipelines in varied physical and occupancy contexts. The dataset supports evaluation in zero-shot and few-shot transfer scenarios, with strict event-level annotation and standardized pre-processing, facilitating reproducibility. As demonstrated in (Custance et al., 5 Jan 2026), WiFlow is instrumental in establishing state-of-the-art results for domain-adaptive crowd counting, supporting both methodological innovation and deployment-readiness studies for robust IoT sensing systems.

PDF Markdown Chat (Pro)

References (1)

Parameter-Efficient Domain Adaption for CSI Crowd-Counting via Self-Supervised Learning with Adapter Modules (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to WiFlow Dataset.

WiFlow: Device-Free Crowd Counting Dataset

1. Data Collection Setup

2. Dataset Composition

3. Domain Variability and Evaluation Splits

4. Data Preprocessing and Augmentation

5. Dataset Statistics and Organization

6. Usage Protocols and Evaluation Practices

7. Significance and Research Applications

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

WiFlow: Device-Free Crowd Counting Dataset

1. Data Collection Setup

2. Dataset Composition

3. Domain Variability and Evaluation Splits

4. Data Preprocessing and Augmentation

5. Dataset Statistics and Organization

6. Usage Protocols and Evaluation Practices

7. Significance and Research Applications

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research