Parameterized Complexity of Categorical Clustering with Size Constraints (2104.07974v1)
Abstract: In the Categorical Clustering problem, we are given a set of vectors (matrix) A={a_1,\ldots,a_n} over \Sigmam, where \Sigma is a finite alphabet, and integers k and B. The task is to partition A into k clusters such that the median objective of the clustering in the Hamming norm is at most B. That is, we seek a partition {I_1,\ldots,I_k} of {1,\ldots,n} and vectors c_1,\ldots,c_k\in\Sigmam such that \sum_{i=1}k\sum_{j\in I_i}d_h(c_i,a_j)\leq B, where d_H(a,b) is the Hamming distance between vectors a and b. Fomin, Golovach, and Panolan [ICALP 2018] proved that the problem is fixed-parameter tractable (for binary case \Sigma={0,1}) by giving an algorithm that solves the problem in time 2{O(B\log B)} (mn){O(1)}. We extend this algorithmic result to a popular capacitated clustering model, where in addition the sizes of the clusters should satisfy certain constraints. More precisely, in Capacitated Clustering, in addition, we are given two non-negative integers p and q, and seek a clustering with p\leq |I_i|\leq q for all i\in{1,\ldots,k}. Our main theorem is that Capacitated Clustering is solvable in time 2{O(B\log B)}|\Sigma|B(mn){O(1)}. The theorem not only extends the previous algorithmic results to a significantly more general model, it also implies algorithms for several other variants of Categorical Clustering with constraints on cluster sizes.