Tight Bounds for Sorting Under Partial Information (2404.08468v3)
Abstract: Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $\Theta(\log e(P))$ linear oracle queries (using $O(n2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n2)$ partial oracle queries and $O(n{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $\Theta(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $\Theta(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(\alpha, \beta)$ where for any constant $c \geq 1$, any algorithm that uses at most $\alpha \cdot n{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $\beta \cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.
- Counting linear extensions. Order, 8(3):225–242, September 1991.
- Optimal finger search trees in the pointer machine. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, STOC ’02, pages 583–591, New York, NY, USA, May 2002. Association for Computing Machinery.
- An Efficient Algorithm for Partial Order Production. SIAM Journal on Computing, 39(7):2927–2940, January 2010. Publisher: Society for Industrial and Applied Mathematics.
- Sorting under partial information (without the ellipsoid algorithm). In Proceedings of the forty-second ACM symposium on Theory of computing, STOC ’10, pages 359–368, New York, NY, USA, June 2010. Association for Computing Machinery.
- Sorting and selection in posets. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms, SODA ’09, pages 392–401, USA, January 2009. Society for Industrial and Applied Mathematics.
- Michael L. Fredman. How good is the information theory bound in sorting? Theoretical Computer Science, 1(4):355–361, April 1976.
- Entropy and sorting. In Proceedings of the twenty-fourth annual ACM symposium on Theory of Computing, STOC ’92, pages 178–187, New York, NY, USA, July 1992. Association for Computing Machinery.
- Balancing poset extensions. Order, 1(2):113–126, June 1984.
- János Körner. Coding of an information source having ambiguous alphabet and the entropy of graphs. In 6th Prague conference on information theory, pages 411–425, 1973.
- Efficient algorithms for sorting in trees. arXiv preprint arXiv:2205.15912, 2022.
- Robert E. Tarjan and Christopher J. Van Wyk. O(n log log n)-time algorithm for triangulating a simple polygon. SIAM Journal on Computing, 17(1):143–178, 1988. Publisher: Society for Industrial and Applied Mathematics Publications.
- Preprocessing Ambiguous Imprecise Points. In Gill Barequet and Yusu Wang, editors, 35th International Symposium on Computational Geometry (SoCG 2019), volume 129 of Leibniz International Proceedings in Informatics (LIPIcs), pages 42:1–42:16, Dagstuhl, Germany, 2019. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.