On a Near-Optimal \& Efficient Algorithm for the Sparse Pooled Data Problem (2312.14588v1)
Abstract: The pooled data problem asks to identify the unknown labels of a set of items from condensed measurements. More precisely, given $n$ items, assume that each item has a label in $\cbc{0,1,\ldots, d}$, encoded via the ground-truth $\SIGMA$. We call the pooled data problem sparse if the number of non-zero entries of $\SIGMA$ scales as $k \sim n{\theta}$ for $\theta \in (0,1)$. The information that is revealed about $\SIGMA$ comes from pooled measurements, each indicating how many items of each label are contained in the pool. The most basic question is to design a pooling scheme that uses as few pools as possible, while reconstructing $\SIGMA$ with high probability. Variants of the problem and its combinatorial ramifications have been studied for at least 35 years. However, the study of the modern question of \emph{efficient} inference of the labels has suggested a statistical-to-computational gap of order $\log n$ in the minimum number of pools needed for theoretically possible versus efficient inference. In this article, we resolve the question whether this $\log n$-gap is artificial or of a fundamental nature by the design of an efficient algorithm, called \algoname, based upon a novel pooling scheme on a number of pools very close to the information-theoretic threshold.
- M. Aldridge, O. Johnson and J. Scarlett “Group Testing: An Information Theory Perspective” In Foundations and Trends in Communications and Information Theory 15.3–4, 2019, pp. 196–392
- “The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics” In Advances in Neural Information Processing Systems 35, 2022, pp. 33831–33844
- E.J. Candes, J. Romberg and T. Tao “Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information” In IEEE Transactions on Information Theory 52.2, 2006, pp. 489–509
- “Optimal Design of Process Flexibility for General Production Systems” In Operations Research 67.2 INFORMS, 2019, pp. 516–531
- “Optimal group testing” In Combinatorics, Probability and Computing 30.6 Cambridge University Press, 2021, pp. 811–848
- “Statistical and Computational Phase Transitions in Group Testing” In Proceedings of 35th Conference on Learning Theory (COLT) 178 PMLR, 2022, pp. 4764–4781
- A. Djackov “On a Search Model of False Coins” In Topics in Information Theory (Colloquia Mathematica Societatis János Bolyai 16). Budapest, Hungary: Hungarian Academy of Sciences 16, 1975, pp. 163–170
- “Thresholds for the Recovery of Sparse Solutions via L1 Minimization” In 2006 40th Annual Conference on Information Sciences and Systems, 2006, pp. 202–206 IEEE
- “Decoding from Pooled Data: Phase Transitions of Message Passing” In IEEE Transactions on Information Theory 65.1 IEEE, 2018, pp. 572–585
- “Quantitative Group Testing and the rank of random matrices” CoRR, abs/2006.09074, 2020 arXiv:2006.09074 [cs.IT]
- “Time-Varying Periodic Convolutional Codes with Low-Density Parity-Check Matrix” In IEEE Transactions on Information Theory 45.6, 1999, pp. 2181–2191
- “Information-Theoretic and Algorithmic Aspects of Parallel and Distributed Reconstruction from Pooled Data” In Journal of Parallel and Distributed Computing 180 Elsevier, 2023, pp. 104718
- “Optimal Reconstruction of Graphs under the Additive Model” In Algorithmica 28.1, 2000, pp. 104–124
- “Near Optimal Efficient Decoding from Pooled Data” In Proceedings of 35th Conference on Learning Theory (COLT) 178, 2022, pp. 3395–3409 PMLR
- S. Janson “On Concentration Of Probability” In Contemporary Combinatorics, Bolyai Society Mathematical Studies. Budapest, Hungary: János Bolyai Mathematical Society 10, 2002, pp. 289–301
- Svante Janson, Tomasz Luczak and Andrzej Rucinski “Random Graphs” John Wiley & Sons, 2011
- “Sparse Graph Codes for Non-adaptive Quantitative Group Testing” In 2019 IEEE Information Theory Workshop (ITW), 2019, pp. 1–5
- “Non-adaptive Quantitative Group Testing Using Irregular Sparse Graph Codes” In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2019, pp. 608–614 IEEE
- S. Kudekar, T. Richardson and R.L. Urbanke “Spatially Coupled Ensembles Universally Achieve Capacity under Belief Propagation” In IEEE Transactions on Information Theory 59.12, 2013, pp. 7761–7813
- Shrinivas Kudekar and Henry D Pfister “The Effect of Spatial Coupling on Compressive Sensing” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2010, pp. 347–353 IEEE
- “Neural Group Testing to Accelerate Deep Learning” In 2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 958–963
- J.P. Martins, R. Santos and R. Sousa “Testing the Maximum by the Mean in Quantitative Group Tests” In New Advances in Statistical Modeling and Applications Springer, 2014, pp. 55–63
- “Support Recovery in Universal One-Bit Compressed Sensing” In 13th Innovations in Theoretical Computer Science Conference, 2022, pp. 106:1–106:20 Schloss Dagstuhl-Leibniz-Zentrum für Informatik
- “Phase Transitions in the Pooled Data Problem” In Advances in Neural Information Processing Systems 30, 2017, pp. 376–384
- “DNA Pooling: A Tool for Large-Scale Association Studies” In Nature Reviews Genetics 3.11, 2002, pp. 862–871
- H.S. Shapiro “Problem E 1399” In Amer. Math. Monthly 67, 1960, pp. 82
- Mahdi Soleymani, Hessam Mahdavifar and Tara Javidi “Non-Adaptive Quantitative Group Testing via Plotkin-Type Constructions” In 2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 1854–1859
- C. Wang, Q. Zhao and C.N. Chuah “Group Testing under Sum Observations for Heavy Hitter Detection” In 2015 Information Theory and Applications Workshop (ITA), 2015, pp. 149–153 IEEE
- Alexander S Wein “Optimal Low-Degree Hardness of Maximum Independent Set” In Mathematical Statistics and Learning 4.3, 2022, pp. 221–251
- “Parallel Feature Selection Inspired by Group Testing” In Advances in Neural Information Processing Systems 27, 2014, pp. 3554–3562