Proxy-Based Biclustering Model Trees

Updated 23 November 2025

The paper introduces Oxytrees, which employ proxy-based compression and Kronecker product kernel regression to overcome scalability and generalization challenges in bipartite learning.
Proxy-based biclustering model trees use an impurity function with proxy matrices to optimize split selection and enable efficient batch leaf-assignment for rapid predictions.
Empirical evaluations show up to 30× faster training and 10× quicker predictions compared to traditional methods, with competitive performance across various biological interaction datasets.

Proxy-based biclustering model trees, exemplified by the Oxytrees algorithm, are a class of machine learning models designed to efficiently learn and predict interactions in bipartite learning scenarios. Bipartite learning involves estimating or predicting values in a large, partially observed interaction matrix $Y \in \mathbb{R}^{n_1 \times n_2}$ , where each entry depends on a pair of feature vectors associated with distinct types of entities—for example, drug–target or RNA–disease interactions. Oxytrees address the scalability and generalization limitations of previous biclustering and model tree approaches by introducing proxy-based compression for split optimization, model-tree leaf learning via Kronecker product kernel regression, and efficient batch inference for large prediction tasks (Ilídio et al., 16 Nov 2025).

1. Problem Setting and Motivations

Bipartite learning is characterized by the presence of two separate sets of feature matrices, $X_1 \in \mathbb{R}^{n_1 \times m_1}$ and $X_2 \in \mathbb{R}^{n_2 \times m_2}$ , corresponding to the rows and columns of a large, sparse interaction matrix $Y$ . Many rows or columns may have only limited observed interactions, and the matrix is typically too large for direct modeling. Oxytrees aim to:

Discover a biclustering structure in $Y$ .
Fit a simple (linear-in-features) regression or classification model per bicluster.
Enable fast prediction on novel $(X_1^i, X_2^j)$ dyads, critical for inductive and semi-inductive learning.
Overcome the domain specificity and scalability bottlenecks of prior biclustering forests and constant-leaf model trees.

2. Impurity Function and Proxy Matrix Construction

Oxytrees utilize an impurity function $I(\cdot)$ that can be efficiently computed from sufficient statistics. The core form is: $I(Y_\text{node}) = \rho\left( \sum_{ij} \mu(Y_\text{node}^{ij}) \right)$ For variance-based impurity:

$\mu(y) = [1, y, y^2]$
$\rho(a,b,c) = c/a - (b/a)^2$ This enables the construction of two proxy matrices at each node:

Proxy Matrix	Dimension	Content Description
$\tilde Y_1$	$n_{\text{row}} \times 3$	Row aggregations: $[n_i,\,\sum_j y_{ij},\,\sum_j y_{ij}^2]$
$\tilde Y_2$	$n_{\text{col}} \times 3$	Column aggregations: $[n_j,\,\sum_i y_{ij},\,\sum_i y_{ij}^2]$

These proxies allow the impurity of any bicluster resulting from a candidate row or column split to be computed as a function of partial sums over these proxies, without full recomputation over $Y$ .

3. Efficient Split Search and Biclustering Criterion

Oxytrees conditionalize splits on either row or column features. For a given node collecting a submatrix $Y_\text{node}$ :

Splitting on a subset of rows or columns yields submatrices $Y_A$ and $Y_B$ .
The split is chosen to maximize impurity reduction:

$\Delta I(s) = I(Y_\text{node}) - \left(\frac{n_A}{n}I(Y_A) + \frac{n_B}{n}I(Y_B)\right)$

where $n_A$ and $n_B$ are the dyad counts in $Y_A$ and $Y_B$ .

Proxy-based statistics permit evaluating each candidate split for impurity reduction in $O(n_\text{row} + n_\text{col})$ time per proxy build, rather than $O(n_\text{row} \cdot n_\text{col} \cdot m_\text{splits})$ for direct computations. Split selection sweeps through sorted feature values of $X_1$ and $X_2$ , with per-split costs $O(n_\text{row}\log n_\text{row})$ or $O(n_\text{col}\log n_\text{col})$ for numerics.

4. Model-Tree Construction and Leaf Fitting with Kronecker Ridge Regression

The model tree structure recursively partitions $Y$ by alternating vertical and horizontal splits. Each leaf node receives all dyads falling into its corresponding bicluster and fits a regularized least-squares (RLS) model with a Kronecker product kernel (RLS-Kron):

Kernels $k_X$ on $X_1$ and $k_Y$ on $X_2$ define the joint kernel between dyads: $k((x_i, y_j), (x_{i'}, y_{j'})) = k_X(x_i, x_{i'}) \, k_Y(y_j, y_{j'})$ .
For a leaf with $r$ $r$ row entities and $c$ $c$ column entities:
- $K_1 \in \mathbb{R}^{r \times r}$ with entries $k_X(X_1^p, X_1^q)$
- $K_2 \in \mathbb{R}^{c \times c}$ with entries $k_Y(X_2^u, X_2^v)$
RLS-Kron optimizes:

$\min_W \, \|Y_\text{leaf} - K_2 W K_1\|_F^2 + \lambda \|W\|_F^2$

with a closed-form solution based on eigendecomposition and elementwise operations. Prediction for new dyads $(x_i^*, y_j^*)$ uses:

$\hat Y = \Phi_2^* W {\Phi_1^*}^T$

where $\Phi_1^*$ and $\Phi_2^*$ are kernel feature matrices for new $X_1^*$ , $X_2^*$ .

5. Batch Leaf-Assignment and Fast Inference

Naïve application of a tree model to all test pairs would require $O(n_\text{test}^2 \cdot \text{depth})$ traversals. Oxytrees introduce an optimized batch leaf-assignment algorithm:

At each split, the relevant set ( $X_1^{\text{test}}$ or $X_2^{\text{test}}$ ) is partitioned based on the split condition, and both branches receive appropriate tuples.
At a leaf, all pairs from $X_{1,\ell} \times X_{2,\ell}$ receive predictions from the corresponding RLS-Kron model.
The overall assignment and prediction cost is $O(n_\text{test}^2 + \text{depth} \cdot n_\text{test})$ , which is substantially faster than per-pair traversal for large test sets.

6. Empirical Results and Evaluation

Empirical evaluation on 15 biological interaction datasets demonstrates:

Training speed: Up to $30\times$ faster training compared to BICTR biclustering forests for large matrices ( $n>1000$ , $m\geq 10$ ), with observed complexity $\Theta(n^2 \log n)$ versus $\Theta(n^2 m \log n)$ for BICTR.
Prediction speed: Batch inference is up to $10\times$ faster than BICTR per batch.
Predictive performance:
- In the inductive (TT) setting, ensembles of Oxytrees yield superior or statistically tied AUPRC/AUROC versus BICTR and other baselines (RLS-Kron, NRLMF, WkNNIR), based on Friedman–Nemenyi tests at $p < 0.05$ .
- Advantages are especially pronounced when using RLS-Kron leaf models relative to constant-leaf alternatives.
- Competitive performance persists in semi-inductive (TL, LT), transductive (TD), and partially unlabeled (PU) scenarios.
Ablation studies confirm the necessity of proxy-based split search, RLS-Kron leaf fitting, and batch inference to attain these improvements (Ilídio et al., 16 Nov 2025).

7. Software Implementation and Reproducibility

Oxytrees are provided with a Python API compatible with Scikit-Learn, enabling:

Access to all 15 benchmark datasets and evaluation metrics used in the study.
Reproducibility of experimental results.
Practical deployment for large bipartite learning tasks in computational biology and beyond.

The code and datasets are available at https://github.com/pedroilidio/oxytrees2025.

Markdown Report Issue Upgrade to Chat

References (1)

Oxytrees: Model Trees for Bipartite Learning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Proxy-Based Biclustering Model Trees.