Publish-On-Ping (POP) Algorithm
- Publish-On-Ping (POP) is a memory reclamation technique that defers pointer reservation publication until reclamation, cutting down the overhead from frequent memory fences.
- It leverages POSIX signals to synchronize threads only on reclamation events, optimizing performance in read-intensive concurrent systems.
- Empirical results show POP achieves 1.2× to 4× speedups over traditional hazard pointer methods, enhancing throughput and maintaining strict safety.
The Publish-On-Ping (POP) algorithm is a memory reclamation technique designed to mitigate the performance penalties inherent in pointer-reservation strategies for concurrent data structures. By combining delayed reservation publishing with POSIX signals, POP minimizes the cost of enforcing pointer safety during linked structure traversals, particularly in read-intensive workloads. Unlike hazard pointers and hazard eras, which require global visibility and ordering for every reservation, POP defers publishing until memory reclamation is actually needed, yielding significant throughput improvements while retaining strong safety guarantees.
1. Architectural Principles
Traditional memory reclamation schemes employed in lock-free, concurrent data structures—principally hazard pointers (HP) and hazard eras (HE)—require threads to publish reservations before each pointer access. This typically involves issuing a memory fence to globally announce the pointer (or epoch) a thread is accessing. In scenarios with frequent reads, especially traversals over linked structures, the resulting fence overhead substantially degrades performance, regardless of the actual frequency of memory reclamation.
Publish-On-Ping inverts this model: reservations are tracked locally by each thread throughout the traversal. Only when reclamation is requested does a thread signal all other participants to publish their private reservations. This decoupling of reservation announcement from pointer access eliminates fences or similar synchronization barriers on the fast-path.
A publish-on-ping cycle proceeds as follows:
- Local Reservation: Each thread saves accessed pointers (or epochs) in a private array (localReservations) during structure traversal, with no fence or inter-thread communication.
- Reclamation Trigger: Upon reaching a retire-list threshold (determined by reclaimFreq), the retiring thread initiates a reclamation protocol.
- Ping via Signal: The reclaimer sends a POSIX signal (using pthread_kill) to all other threads, activating a publish handler.
- Publication Handler: Each signaled thread writes its localReservations into a shared array (sharedReservations) and increments its own publishCounter.
- Reclaim Wait: The reclaimer waits until all publishCounters indicate publication completion.
- Safe Reclamation: Objects not appearing in any published reservations are freed.
2. Comparison with Hazard Pointers and Hazard Eras
Hazard pointers and hazard eras demand global publication on every read, imposing memory barriers. This ensures that no thread can reclaim a node while another might still access it. POP, by contrast, only requires publication at reclamation time, massively reducing fence traffic.
The table below summarizes key distinctions:
Technique | Reservation Publication | Per-Read Overhead | Reclamation Safety |
---|---|---|---|
Hazard Ptrs | Immediate (per read) | High (fences) | Strict |
Hazard Eras | Immediate (per read) | High (fences) | Strict |
POP | Delayed (on ping) | Minimal | Strict (on demand) |
POP maintains safety by bounding the number of unreclaimed objects to a value proportional to the number of threads and the maximum reservations per thread.
Performance evaluations in the paper reveal:
- HazardPtrPOP yields speedups from 1.2× to 4× compared to classical HP, and up to 20% over highly optimized HP (Folly library).
- HazardEraPOP achieves up to 3× improvement over standard hazard eras.
- The memory footprint in POP is strictly lower than classical epoch-based reclamation under similar workloads.
3. Implementation Technique
The POP algorithm is realized primarily through modifications to the hazard pointer interface. Threads maintain an array of localReservations and only expose their contents on explicit request via a signal handler. Representative pseudocode for the “read” and “publish” procedures is as follows:
1 2 3 4 5 |
repeat readPtr ← *ptrAddr localReservations[tid][slot] ← readPtr // no fence needed until readPtr equals *ptrAddr return readPtr |
Upon retirement:
- Objects are queued in a per-thread retire list.
- Once the threshold is reached:
- The reclaimer invokes pthread_kill to ping all threads.
- Each thread publishes its localReservations to sharedReservations and increments publishCounter.
- The reclaimer waits on publishCounters.
- Objects not present in sharedReservations are freed.
The signal handler is lightweight and does not disrupt ongoing computation. Publication and reclamation are orthogonal to the data structure implementation and can be incorporated as a drop-in replacement for HP/HE mechanisms.
4. EpochPOP: Hybrid Reclamation Variant
EpochPOP combines epoch-based reclamation with POP’s delayed reservation mechanics—addressing the limitation in which epoch-based schemes can block reclamation if even a single thread is delayed. Epoch-based reclamation works efficiently when threads advance epochs synchronously, reclaiming nodes after all threads observe the new epoch.
EpochPOP operates in two modes:
- Fast Path: Threads announce operation epochs; object reclamation proceeds once all threads reach a later epoch. Reservations are still batched locally (as per the POP protocol) with negligible overhead.
- Slow Path (Fallback): If a retire list grows large (indicating a potential thread delay), the reclaimer triggers the POP protocol, forcing all threads to publish reservations (via signals). Objects are then reclaimed using published data.
EpochPOP thus approaches epoch-based performance in typical cases, but offers the robust guarantees of hazard pointers—never indefinitely blocking reclamation due to delayed threads.
5. Empirical Results and Performance
Evaluations in the paper benchmark POP and its variants against traditional HP/HE on a suite of concurrent data structures:
- Harris–Michael lists and lazy lists
- Binary search trees (ABT, DGT)
- Hash tables built atop these primitives
Key quantitative findings:
- HazardPtrPOP: Throughput improved by 1.2× to 4× over standard HP, and up to 20% over optimized HP.
- HazardEraPOP: Up to 3× faster than hazard era implementations.
- EpochPOP: Delivers throughput comparable to classical epoch-based reclamation, with strictly lower unreclaimed memory and better worst-case guarantees.
Performance graphs indicate reduced reclamation cycles and better scalability, especially for read-dominated workloads where fence overheads previously dominated latency.
6. Applicability in System Design
POP is suitable as a direct replacement for hazard pointer interfaces in C/C++ concurrent data structures, including:
- In-memory databases
- Lock-free indexing structures
- High-performance cache subsystems
- Kernel/OS components that demand precise memory reclamation
Compatibility with existing interfaces (HP/HE) facilitates easy integration. By reducing pointer reservation costs on the read path, POP enables higher throughput and lower latency in multi-threaded environments. The use of POSIX signals (e.g., pthread_kill) is language agnostic and supported on most Unix-like operating systems.
A plausible implication is that POP is particularly advantageous in high-core-count environments and data structures with substantial traversal locality, where the performance gap between fence-heavy and fence-light protocols is most pronounced.
7. Conclusions
Publish-On-Ping (POP) advances memory reclamation in concurrent data structures by deferring publication of reservations until explicit demand. Utilizing POSIX signals for cross-thread coordination, POP maintains strict safety while excising the major source of overhead—frequent fencing and publication on every read. Its compatibility with hazard pointer interfaces, empirical throughput improvements, and robust design make it a compelling choice for footprint-sensitive, performance-intensive systems programming. EpochPOP, as a hybrid variant, offers the dual benefits of epoch-based speed and the resistance to delayed-thread starvation found in pointer-based schemes.