Papers
Topics
Authors
Recent
Search
2000 character limit reached

Proxy-Based Biclustering Model Trees

Updated 23 November 2025
  • The paper introduces Oxytrees, which employ proxy-based compression and Kronecker product kernel regression to overcome scalability and generalization challenges in bipartite learning.
  • Proxy-based biclustering model trees use an impurity function with proxy matrices to optimize split selection and enable efficient batch leaf-assignment for rapid predictions.
  • Empirical evaluations show up to 30× faster training and 10× quicker predictions compared to traditional methods, with competitive performance across various biological interaction datasets.

Proxy-based biclustering model trees, exemplified by the Oxytrees algorithm, are a class of machine learning models designed to efficiently learn and predict interactions in bipartite learning scenarios. Bipartite learning involves estimating or predicting values in a large, partially observed interaction matrix YRn1×n2Y \in \mathbb{R}^{n_1 \times n_2}, where each entry depends on a pair of feature vectors associated with distinct types of entities—for example, drug–target or RNA–disease interactions. Oxytrees address the scalability and generalization limitations of previous biclustering and model tree approaches by introducing proxy-based compression for split optimization, model-tree leaf learning via Kronecker product kernel regression, and efficient batch inference for large prediction tasks (Ilídio et al., 16 Nov 2025).

1. Problem Setting and Motivations

Bipartite learning is characterized by the presence of two separate sets of feature matrices, X1Rn1×m1X_1 \in \mathbb{R}^{n_1 \times m_1} and X2Rn2×m2X_2 \in \mathbb{R}^{n_2 \times m_2}, corresponding to the rows and columns of a large, sparse interaction matrix YY. Many rows or columns may have only limited observed interactions, and the matrix is typically too large for direct modeling. Oxytrees aim to:

  • Discover a biclustering structure in YY.
  • Fit a simple (linear-in-features) regression or classification model per bicluster.
  • Enable fast prediction on novel (X1i,X2j)(X_1^i, X_2^j) dyads, critical for inductive and semi-inductive learning.
  • Overcome the domain specificity and scalability bottlenecks of prior biclustering forests and constant-leaf model trees.

2. Impurity Function and Proxy Matrix Construction

Oxytrees utilize an impurity function I()I(\cdot) that can be efficiently computed from sufficient statistics. The core form is: I(Ynode)=ρ(ijμ(Ynodeij))I(Y_\text{node}) = \rho\left( \sum_{ij} \mu(Y_\text{node}^{ij}) \right) For variance-based impurity:

  • μ(y)=[1,y,y2]\mu(y) = [1, y, y^2]
  • ρ(a,b,c)=c/a(b/a)2\rho(a,b,c) = c/a - (b/a)^2 This enables the construction of two proxy matrices at each node:
Proxy Matrix Dimension Content Description
Y~1\tilde Y_1 nrow×3n_{\text{row}} \times 3 Row aggregations: [ni,jyij,jyij2][n_i,\,\sum_j y_{ij},\,\sum_j y_{ij}^2]
Y~2\tilde Y_2 ncol×3n_{\text{col}} \times 3 Column aggregations: [nj,iyij,iyij2][n_j,\,\sum_i y_{ij},\,\sum_i y_{ij}^2]

These proxies allow the impurity of any bicluster resulting from a candidate row or column split to be computed as a function of partial sums over these proxies, without full recomputation over YY.

3. Efficient Split Search and Biclustering Criterion

Oxytrees conditionalize splits on either row or column features. For a given node collecting a submatrix YnodeY_\text{node}:

  • Splitting on a subset of rows or columns yields submatrices YAY_A and YBY_B.
  • The split is chosen to maximize impurity reduction:

ΔI(s)=I(Ynode)(nAnI(YA)+nBnI(YB))\Delta I(s) = I(Y_\text{node}) - \left(\frac{n_A}{n}I(Y_A) + \frac{n_B}{n}I(Y_B)\right)

where nAn_A and nBn_B are the dyad counts in YAY_A and YBY_B.

Proxy-based statistics permit evaluating each candidate split for impurity reduction in O(nrow+ncol)O(n_\text{row} + n_\text{col}) time per proxy build, rather than O(nrowncolmsplits)O(n_\text{row} \cdot n_\text{col} \cdot m_\text{splits}) for direct computations. Split selection sweeps through sorted feature values of X1X_1 and X2X_2, with per-split costs O(nrowlognrow)O(n_\text{row}\log n_\text{row}) or O(ncollogncol)O(n_\text{col}\log n_\text{col}) for numerics.

4. Model-Tree Construction and Leaf Fitting with Kronecker Ridge Regression

The model tree structure recursively partitions YY by alternating vertical and horizontal splits. Each leaf node receives all dyads falling into its corresponding bicluster and fits a regularized least-squares (RLS) model with a Kronecker product kernel (RLS-Kron):

  • Kernels kXk_X on X1X_1 and kYk_Y on X2X_2 define the joint kernel between dyads: k((xi,yj),(xi,yj))=kX(xi,xi)kY(yj,yj)k((x_i, y_j), (x_{i'}, y_{j'})) = k_X(x_i, x_{i'}) \, k_Y(y_j, y_{j'}).
  • For a leaf with rr row entities and cc column entities:
    • K1Rr×rK_1 \in \mathbb{R}^{r \times r} with entries kX(X1p,X1q)k_X(X_1^p, X_1^q)
    • K2Rc×cK_2 \in \mathbb{R}^{c \times c} with entries kY(X2u,X2v)k_Y(X_2^u, X_2^v)
  • RLS-Kron optimizes:

minWYleafK2WK1F2+λWF2\min_W \, \|Y_\text{leaf} - K_2 W K_1\|_F^2 + \lambda \|W\|_F^2

with a closed-form solution based on eigendecomposition and elementwise operations. Prediction for new dyads (xi,yj)(x_i^*, y_j^*) uses:

Y^=Φ2WΦ1T\hat Y = \Phi_2^* W {\Phi_1^*}^T

where Φ1\Phi_1^* and Φ2\Phi_2^* are kernel feature matrices for new X1X_1^*, X2X_2^*.

5. Batch Leaf-Assignment and Fast Inference

Naïve application of a tree model to all test pairs would require O(ntest2depth)O(n_\text{test}^2 \cdot \text{depth}) traversals. Oxytrees introduce an optimized batch leaf-assignment algorithm:

  • At each split, the relevant set (X1testX_1^{\text{test}} or X2testX_2^{\text{test}}) is partitioned based on the split condition, and both branches receive appropriate tuples.
  • At a leaf, all pairs from X1,×X2,X_{1,\ell} \times X_{2,\ell} receive predictions from the corresponding RLS-Kron model.
  • The overall assignment and prediction cost is O(ntest2+depthntest)O(n_\text{test}^2 + \text{depth} \cdot n_\text{test}), which is substantially faster than per-pair traversal for large test sets.

6. Empirical Results and Evaluation

Empirical evaluation on 15 biological interaction datasets demonstrates:

  • Training speed: Up to 30×30\times faster training compared to BICTR biclustering forests for large matrices (n>1000n>1000, m10m\geq 10), with observed complexity Θ(n2logn)\Theta(n^2 \log n) versus Θ(n2mlogn)\Theta(n^2 m \log n) for BICTR.
  • Prediction speed: Batch inference is up to 10×10\times faster than BICTR per batch.
  • Predictive performance:
    • In the inductive (TT) setting, ensembles of Oxytrees yield superior or statistically tied AUPRC/AUROC versus BICTR and other baselines (RLS-Kron, NRLMF, WkNNIR), based on Friedman–Nemenyi tests at p<0.05p < 0.05.
    • Advantages are especially pronounced when using RLS-Kron leaf models relative to constant-leaf alternatives.
    • Competitive performance persists in semi-inductive (TL, LT), transductive (TD), and partially unlabeled (PU) scenarios.
  • Ablation studies confirm the necessity of proxy-based split search, RLS-Kron leaf fitting, and batch inference to attain these improvements (Ilídio et al., 16 Nov 2025).

7. Software Implementation and Reproducibility

Oxytrees are provided with a Python API compatible with Scikit-Learn, enabling:

  • Access to all 15 benchmark datasets and evaluation metrics used in the study.
  • Reproducibility of experimental results.
  • Practical deployment for large bipartite learning tasks in computational biology and beyond.

The code and datasets are available at https://github.com/pedroilidio/oxytrees2025.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Proxy-Based Biclustering Model Trees.