Group Testing: An Information Theory Perspective (1902.06002v3)

Published 15 Feb 2019 in cs.IT, cs.DM, math.IT, math.PR, math.ST, and stat.TH

Abstract: The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.

Citations (266)

View on Semantic Scholar

Summary

The paper provides an in-depth survey of group testing, establishing a theoretical lower bound using information-theoretic arguments.
It evaluates both adaptive and nonadaptive testing strategies, highlighting methods like COMP, DD, SCOMP, and SSS with quantifiable performance rates.
The study extends its analysis to noisy models, demonstrating that techniques such as belief propagation and linear programming deliver robust defect identification.

Essay: Group Testing: An Information Theory Perspective

The paper "Group Testing: An Information Theory Perspective" by Aldridge, Johnson, and Scarlett provides a comprehensive survey of the group testing problem, analyzed through the lens of information theory. Group testing is a combinatorial problem where the objective is to identify a small number of defective items within a larger population using the fewest number of tests. Each test can pool multiple items, with the test result indicating whether at least one defective item is present. The paper leverages information-theoretic techniques to address fundamental limits, algorithmic design, and rate analysis within this problem space.

Problem Setup and Model Variants

Group testing, initially devised during World War II for syphilis detection among soldiers, is presented as an efficient method for identifying defectives in various domains such as medical testing, communications, and data science. The authors differentiate between adaptive and nonadaptive testing strategies: adaptive algorithms test in sequential stages using feedback from previous stages, whereas nonadaptive algorithms plan all tests in advance, facilitating parallel processing.

The survey predominantly focuses on the nonadaptive strategy, exploring both noiseless and noisy models. In noiseless models, test outcomes perfectly reflect the presence of defectives, while noisy models involve random errors, requiring robust algorithms to ensure correct defect identification.

Theoretical Bounds and Information-Theoretic Insights

A cornerstone of the paper is the theoretical framework provided by information theory. The central theorem establishes a counting bound via information-theoretic arguments, demonstrating that at least $\log_2 \binom{n}{k}$ tests are necessary to identify the defectives with high probability, where $n$ is the number of items and $k$ is the number of defectives. This bound serves as a benchmark for evaluating the efficiency of any group testing algorithm. The authors introduce the concept of the rate of group testing, defined as the ratio of the information learned ( $\log_2 \binom{n}{k}$ ) to the number of tests $T$ , capturing the efficiency of information acquisition per test.

Algorithmic Developments

The paper systematically evaluates several algorithms, both known and novel, assessing their performance in terms of achievable rate. Noteworthy algorithms include:

Combinatorial Orthogonal Matching Pursuit (COMP): A simple algorithm that marks an item as nondefective if it appears in a negative test, achieving a rate of 0.531 in noiseless settings.
Definite Defectives (DD): An improvement over COMP, where items in positive tests are marked as definitely defective if they appear in no other positive tests with additional potentially defective items.
Sequential COMP (SCOMP): Builds on DD by sequentially adding items to the set of defectives until all test outcomes are explained.
Smallest Satisfying Set (SSS): A theoretically optimal but computationally expensive method that finds the smallest set of items satisfying all test results.

The authors also explore linear programming relaxations of the SSS problem, which offer practical approaches while maintaining near-optimal performance in certain regimes.

Noisy Group Testing and Robust Algorithms

To accommodate realistic scenarios with test errors, the paper extends group testing to noisy models, such as binary symmetric noise or erasure channels. Techniques like belief propagation and relaxed linear programming adapt well to these settings, demonstrating the robustness required for practical applications.

Achievability and Capacity

Both theoretical and simulation results reveal that nonadaptive group testing can achieve optimal rates for $k = O(n^{1/3})$ , where the rate approaches 1. Beyond this sparse regime, practical algorithms like DD and SCOMP offer competitive rates, especially when coupled with well-chosen test designs, such as near-constant column weight matrices.

Extensions and Applications

The survey further discusses extensions to partial recovery, subgroup testing, and adaptive strategies with limited stages. The framework's generality also accommodates new constraints, such as graph-based test designs in network tomography or scenarios with heterogeneous item defectivity probabilities.

Conclusion

The integration of information theory into group testing provides profound insights into its fundamental limits and potential improvements, showcasing the power of combinatorial designs and probabilistic reasoning. By illustrating both theoretical benchmarks and practical algorithms, this comprehensive survey paves the way for future advancements in efficiently solving sparse recovery problems across domains. The implications of this work extend beyond group testing, influencing fields from signal processing to machine learning, where efficient information acquisition from sparse signals is paramount.

Related Papers

Tweets

https://twitter.com/BristOliver/status/1802611020948521102