A Semidefinite Programming-Based Branch-and-Cut Algorithm for Biclustering (2403.11351v3)
Abstract: Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display similar patterns. As a model problem for biclustering, we consider the $k$-densest-disjoint biclique problem, whose goal is to identify $k$ disjoint complete bipartite subgraphs (called bicliques) of a given weighted complete bipartite graph such that the sum of their densities is maximized. To address this problem, we present a tailored branch-and-cut algorithm. For the upper bound routine, we consider a semidefinite programming relaxation and propose valid inequalities to strengthen the bound. We solve this relaxation in a cutting-plane fashion using a first-order method. For the lower bound, we design a maximum weight matching rounding procedure that exploits the solution of the relaxation solved at each node. Computational results on both synthetic and real-world instances show that the proposed algorithm can solve instances approximately 20 times larger than those handled by general-purpose solvers.
- Ames, B. P. (2014). Guaranteed clustering and biclustering via semidefinite programming. Mathematical Programming, 147, 429–465.
- Iterative signature algorithm for the analysis of large-scale gene expression data. Physical review E, 67, 031902.
- On a decomposition lemma for positive semi-definite block-matrices. Linear Algebra and its Applications, 437, 1906–1912.
- Biclustering in data mining. Computers & Operations Research, 35, 2964–2987.
- Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (p. 93–103). AAAI Press.
- TCGA2BED: extracting, extending, integrating, and querying the cancer genome atlas. BMC Bioinformatics, 18, 1–9.
- The ratio-cut polytope and k-means clustering. SIAM Journal on Optimization, 32, 173–203.
- Clustering cancer gene expression data: a comparative study. BMC bioinformatics, 9, 1–14.
- Geometry of Cuts and Metrics. Algorithms and Combinatorics. Springer Berlin Heidelberg.
- Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD ’01 (p. 269–274). New York, NY, USA: Association for Computing Machinery.
- Biclustering protein complex interactions with a biclique finding algorithm. In Sixth International Conference on Data Mining (ICDM’06) (pp. 178–187). IEEE.
- An interior point algorithm for minimum sum-of-squares clustering. SIAM Journal on Scientific Computing, 21, 1485–1505.
- Relaxations and randomized methods for nonconvex qcqps. EE392 Class Notes, Stanford University, 1, 1–16.
- Recent advances of data biclustering with application in computational neuroscience. In Computational Neuroscience (pp. 85–112). New York, NY: Springer New York.
- Integer programming of biclustering based on graph models. In Optimization and Optimal Control (pp. 479–498). Springer.
- Linear and quadratic programming approaches for the general graph partitioning problem. Journal of Global Optimization, 48, 57–71.
- Multi-way clustering and biclustering by the ratio cut and normalized cut in graphs. Journal of Combinatorial Optimization, 23, 224–251.
- Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman.
- New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 1074–1085.
- Hartigan, J. A. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67, 123–129.
- Latent class models for collaborative filtering. In Proceedings of the 16th International Joint Conference on Artificial Intelligence - Volume 2 IJCAI’99 (p. 688–693). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- A Lagrangian bound on the clique number and an exact algorithm for the maximum edge weight clique problem. INFORMS Journal on Computing, 32, 747–762.
- Auto-weighted multi-view co-clustering with bipartite graphs. Information Sciences, 512, 18–30.
- Rigorous error bounds for the optimal value in semidefinite programming. SIAM Journal on Numerical Analysis, 46, 180–200.
- Spectral biclustering of microarray data: coclustering genes and conditions. Genome research, 13, 703–716.
- Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.
- Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1, 24–45.
- Mirkin, B. (1996). Mathematical Classification and Clustering volume 11 of Nonconvex Optimization and Its Applications. Kluwer Academic Publishers.
- ADMM for the SDP relaxation of the QAP. Mathematical Programming Computation, 10, 631–658.
- SWIM: a computational tool to unveiling crucial nodes in complex biological networks. Scientific Reports, 7, 44797.
- A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18, 1–25.
- Maximum weighted edge biclique problem on bipartite graphs. In Conference on Algorithms and Discrete Applied Mathematics (pp. 116–128). Springer.
- Peeters, R. (2003). The maximum edge biclique problem is np-complete. Discrete Applied Mathematics, 131, 651–654.
- Constrained co-clustering of gene expression data. In Proceedings of the 2008 SIAM International Conference on Data Mining (pp. 25–36). SIAM.
- An exact algorithm for semi-supervised minimum sum-of-squares clustering. Computers & Operations Research, 147, 105958.
- Global optimization for cardinality-constrained minimum sum-of-squares clustering via semidefinite programming. Mathematical Programming, (pp. 1–35).
- SOS-SDP: An exact solver for minimum sum-of-squares clustering. INFORMS Journal on Computing, 34, 2144–2162.
- Bipartite isoperimetric graph partitioning for data co-clustering. Data Mining and Knowledge Discovery, 16, 276–312.
- Sahinidis, N. V. (1996). BARON: A general purpose global optimization software package. Journal of Global Optimization, 8, 201–205.
- Finding large average submatrices in high dimensional data. The Annals of Applied Statistics, (pp. 985–1012).
- Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905.
- Weighted bilateral k-means algorithm for fast co-clustering and fast spectral clustering. Pattern Recognition, 109, 107560.
- Constrained coclustering for textual documents. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 581–586). volume 24.
- Sotirov, R. (2014). An efficient semidefinite programming relaxation for the graph partition problem. INFORMS Journal on Computing, 26, 16–30.
- A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM Journal on Optimization, 25, 882–915.
- SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0). Optimization Methods and Software, 35, 87–115.
- Semidefinite programming. SIAM Review, 38, 49–95.
- The cancer genome atlas pan-cancer analysis project. Nature genetics, 45, 1113–1120.
- Semidefinite programming relaxations for the graph partitioning problem. Discrete Applied Mathematics, 96, 461–479.
- A review on algorithms for maximum clique problems. European Journal of Operational Research, 242, 693–709.
- SDPNAL+: a majorized semismooth Newton-CG augmented lagrangian method for semidefinite programming with nonnegative constraints. Mathematical Programming Computation, 7, 331–366.
- Bipartite graph partitioning and data clustering. In Proceedings of the Tenth International Conference on Information and Knowledge Management CIKM ’01 (p. 25–32). New York, NY, USA: Association for Computing Machinery.