Papers
Topics
Authors
Recent
Search
2000 character limit reached

Online Non-Centroid Clustering with Delays

Updated 29 January 2026
  • The paper introduces a delayed assignment framework to partition online data, balancing intra-cluster distance and delay costs under stochastic arrivals.
  • It applies a greedy algorithm that forms clusters by merging pending points using delay balls, ensuring prescribed cluster sizes with irrevocable assignments.
  • Theoretical analysis demonstrates a constant-factor competitive ratio, contrasting stochastic arrival results with worst-case adversarial scenarios.

Online non-centroid clustering with delays addresses the problem of partitioning sequentially arriving data points into clusters, while allowing a controlled delay before irrevocable assignment. Key objectives include minimizing both the intra-cluster distance costs—determined by a given metric space—and explicit delay costs incurred for postponing decisions. Notably, the task generalizes beyond centroid-based clustering paradigms, instead enforcing prescribed cluster sizes and handling irrevocable assignments in the presence of online, stochastic arrivals. Recent theoretical advances have resolved competitive guarantees for this setting under random (i.i.d.) arrival models, in contrast to the known worst-case impossibility results.

1. Formal Model and Cost Structure

Let the space of locations be X\mathcal{X}, X=m|\mathcal{X}| = m, endowed with a metric d:X×XR0d : \mathcal{X} \times \mathcal{X} \to \mathbb{R}_{\ge 0} (satisfying symmetry, identity of indiscernibles, triangle inequality). During TT discrete rounds, at each time tt, either no point arrives or a point indexed ii appears at location iX\ell_i \in \mathcal{X}. Assignment proceeds as follows:

  • The nn observed points must eventually be partitioned into kk clusters, each of prescribed size n1n2nkn_1 \ge n_2 \ge \ldots \ge n_k, with nm2n_m \ge 2 and m=1knm=n\sum_{m=1}^k n_m = n.
  • Upon each arrival at time tit_i, the assignment of point ii may be postponed until some later sitis_i \ge t_i, incurring unit delay penalty per timestep: wi=sitiw_i = s_i - t_i, cdelay(wi)=wic_{\rm delay}(w_i)=w_i.
  • Once assignment is made, it is irrevocable: a point is either inserted into an existing cluster not yet at capacity, or paired with another pending point to initiate a new cluster.
  • The total cost is the sum of intra-cluster pairwise distances and delay costs:

$\TC(\mathcal{C},\mathbf{w}) = \sum_{C \in \mathcal{C}} \sum_{i \ne j \in C} \left[ d(\ell_i, \ell_j) + w_i + w_j \right].$

A double-counting correction is required for precise computation, as delay costs appear in each unordered pair.

This formulation encapsulates the trade-off between waiting for better clustering (reducing intra-cluster distances) versus incurring increasing delay costs by postponing assignments (Cohen, 22 Jan 2026).

2. Stochastic Arrival Model and Performance Metric

Classical online clustering assumes data arrives in adversarial order, precluding any constant-factor competitive algorithm. To circumvent this impossibility, the stochastic model assumes:

  • Each round, with probability px>0p_x > 0, a point arrives at location xXx \in \mathcal{X}, independently across rounds; xXpx1\sum_{x \in \mathcal{X}} p_x \le 1.
  • The sequence of arrivals is thus i.i.d. according to an unknown, fixed distribution over X\mathcal{X}.
  • The online algorithm does not know {px}x\{ p_x \}_x a priori.

Performance is measured by the ratio-of-expectations (RoE),

$\roe(\mathcal{A}) = \limsup_{n \to \infty} \frac{ \mathbb{E}[\text{cost of } \mathcal{A}] }{ \mathbb{E}[\mathrm{OPT}] },$

where OPT denotes the optimal offline algorithm with full knowledge of the arrival sequence. This provides a strict benchmark for stochastic online algorithms (Cohen, 22 Jan 2026).

3. The DelayedGreedy Algorithm

For this setting, the DelayedGreedy algorithm constructs partial clusterings and dynamically assigns pending points as follows:

  • For each point ii not yet assigned, maintain its arrival time tit_i and current age (tti)(t-t_i).
  • Each unassigned (pending) point grows a "delay ball"—an Editor's term—of radius (tti)(t-t_i). Two types of assignments may occur:

    1. Inserting into existing clusters: For pending point ii, if any cluster CmC_m is not full and all its members jj satisfy d(i,j)(tti)+wjd(\ell_i, \ell_j) \le (t - t_i) + w_j, ii can be inserted in CmC_m (choosing the minimizer of incremental total cost).
    2. Initiating new clusters: If there exists another pending jj, and some not-yet-opened cluster CmC_m (currently empty), such that d(i,j)(tti)+(ttj)d(\ell_i, \ell_j) \le (t - t_i) + (t - t_j), assign ii and jj jointly to form CmC_m (again, minimizing incremental cost).
  • If neither applies, ii remains pending at this timestep.

  • All updates are performed iteratively for each pending point in arbitrary order.

This local greedy mechanism merges pending points as soon as their "delay balls" meet, either directly (starting a new cluster) or by accumulation into existing partially filled clusters. The algorithm maintains the invariant of prescribed cluster sizes and irrevocable assignments (Cohen, 22 Jan 2026).

4. Theoretical Guarantees and Analysis

The main result for DelayedGreedy establishes a constant-factor competitive ratio under stochastic arrivals:

  • Let n1=maxmnmn_1 = \max_m n_m, nk=minmnmn_k = \min_m n_m. Then

$\roe(\mathrm{DelayedGreedy}) \le \frac{8(n_1-1)}{(n_k-1)(1-e^{-2})},$

and in the case of equal cluster sizes, $\roe \le \frac{8}{1-e^{-2}} \approx 10.5$ as nn \to \infty.

  • The analysis is based on two central lemmas:

    1. For any produced clustering and delay profile w\mathbf{w}, total cost is bounded by 2(n11)iwi2(n_1-1)\sum_i w_i.
    2. Every point ii's minimum pairwise cost (considering optimal delays) admits an expected lower bound, related to the probability mass in metric balls around its location.
  • Assignment radii rxr_x for each xXx \in \mathcal{X} are set by solving rx=min{r:1/(y:d(x,y)rpy)r}r_x = \min\{ r : 1/(\sum_{y: d(x, y) \le r} p_y) \le r \}, balancing expected wait time against intra-cluster distances.

  • The final expected total cost for DelayedGreedy admits the bound 2(n11)[nxpxrx+O(1)]2(n_1-1)[n\sum_x p_x r_x + O(1)].
  • The offline optimum's cost is lower-bounded as n(nk1)(1e2)xpx/qxn(n_k-1)(1-e^{-2})\sum_x p_x/q_x, where qx=y:d(x,y)<rxpyq_x = \sum_{y: d(x, y) < r_x} p_y.

A direct consequence is that, in sharp contrast to the worst-case adversarial order (where no O(1)O(1)-competitive algorithm exists), a simple greedy delay-driven protocol achieves constant-factor optimality under i.i.d. arrivals (Cohen, 22 Jan 2026).

5. Delay Trade-Offs and Parameter Choices

The delay penalty function cdelay(Δ)c_{\rm delay}(\Delta) is taken as linear in the canonical setup, but the proof extends immediately if cdelay(Δ)=λΔc_{\rm delay}(\Delta) = \lambda \Delta with λ>0\lambda > 0, simply scaling the cost and competitive ratio by λ\lambda. The assignment radii rxr_x encode the fundamental trade-off: larger rxr_x permit longer expected waits (fostering tighter clusters), but amplify the risk of increasing delay costs; conversely, high delay penalties necessitate small rxr_x and earlier assignments, potentially increasing intra-cluster distances. The algorithm's flexibility is thus governed via the prescribed cluster sizes, metric geometry, and delay penalty parameter.

6. Absence of Empirical Results

No empirical or simulation evaluation is reported for this framework—its contributions are exclusively theoretical, focusing on competitive analysis and structural bounds for stochastic online clustering (Cohen, 22 Jan 2026). A plausible implication is that future research may seek to instantiate or test these theoretical guarantees in practical environments and real-world metric spaces.


For a rigorous derivation, algorithm pseudocode, and the detailed proofs of cost bounds and lower bounds therein, see "Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals" (Cohen, 22 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Online Non-Centroid Clustering with Delays.