PowerBin: Fast Adaptive Data Binning with Centroidal Power Diagrams

Published 8 Sep 2025 in astro-ph.IM | (2509.06903v2)

Abstract: Adaptive binning is a crucial step in the analysis of large astronomical datasets, such as those from integral-field spectroscopy, to ensure a sufficient signal-to-noise ratio (S/N) for reliable model fitting. However, the widely used Voronoi-binning method and its variants suffer from two key limitations: they scale poorly with data size, often as O(N^2), creating a computational bottleneck for modern surveys, and they can produce undesirable non-convex or disconnected bins. I introduce PowerBin, a new algorithm that overcomes these issues. I frame the binning problem within the theory of optimal transport, for which the solution is a Centroidal Power Diagram (CPD), guaranteeing convex bins. Instead of formal CPD solvers, which are unstable with real data, I develop a fast and robust heuristic based on a physical analogy of packed soap bubbles. This method reliably enforces capacity constraints even for non-additive measures like S/N with correlated noise. I also present a new bin-accretion algorithm with O(N log N) complexity, removing the previous bottleneck. The combined PowerBin algorithm scales as O(N log N), making it about two orders of magnitude faster than previous methods on million-pixel datasets. I demonstrate its performance on a range of simulated and real data, showing it produces high-quality, convex tessellations with excellent S/N uniformity. The public Python implementation provides a fast, robust, and scalable tool for the analysis of modern astronomical data.

Abstract PDF Upgrade to Chat

Authors (1)

Michele Cappellari

Summary

The paper introduces PowerBin, a novel adaptive binning algorithm that recasts binning as a capacity-constrained optimal transport problem using CPDs.
It employs a heuristic based on area–radius correspondence and iterative generator updates to achieve convex bins with near O(N log N) scaling.
Empirical evaluations demonstrate its superior S/N uniformity, robustness to noise, and applicability to both astronomical surveys and artistic imaging.

PowerBin: Fast Adaptive Data Binning with Centroidal Power Diagrams

Introduction and Motivation

Adaptive binning remains essential in the analysis of modern astronomical datasets, especially in integral-field spectroscopy (IFS), where sufficient S/N is a prerequisite for robust physical inference from spatially resolved spectra. As demonstrated by parameter recovery experiments, low S/N leads to highly non-Gaussian, multimodal posteriors, producing unreliable and biased results even in ensemble aggregation (Figure 1).

Figure 1: Posterior probability distributions for four kinematic parameters at different S/N illustrating poor inference quality at low S/N and stability at high S/N.

The widely-adopted Voronoi-binning algorithms meet spatial compactness and uniform S/N constraints. However, their computational complexity is prohibitive for datasets with $\sim10^6$ pixels, and the use of multiplicatively-weighted Voronoi tessellations can produce non-convex or disconnected bins. The necessity for a scalable, robust, and morphologically constrained binning framework has become acute with the advent of next-generation IFS surveys.

Theoretical Framework: From Voronoi Diagrams to Optimal Transport

The PowerBin algorithm recasts the binning task as a capacity-constrained optimal transport problem. Instead of ordinary or multiplicatively/additively-weighted Voronoi diagrams, PowerBin employs Centroidal Power Diagrams (CPDs), which guarantee convex, morphologically optimal bin shapes and can enforce per-bin capacity constraints efficiently. The power diagram is distinguished from other generalized Voronoi diagrams by its linear, convex cell boundaries and the direct correspondence between generator weights and bin capacity (Figure 2).

Figure 2: Physical and geometric analogies for adaptive binning; optimal packing of equal-area cells (left), and soap bubble foam as a physical model for capacity-constrained tessellations (right).

The solution to the semi-discrete optimal transport problem with prescribed capacities is a CPD, in which each generator weight effectively controls the volume (pixel count or capacity) of its corresponding cell, while the generator position centers the bin.

Algorithmic Design: PowerBin Heuristic and Scalability

Direct optimization of the dual (Lagrangian) energy functional for CPDs is computationally prohibitive and numerically unstable with discrete, correlated-noise or non-additive capacity measures—that is, the condition encountered in real astronomical data. PowerBin instead develops a physically-motivated heuristic using the area–radius correspondence in the soap bubble analogy. Through iterative updates of generator positions (geometric centroids) and weights (squared radii), the algorithm drives bin capacities toward target values while maintaining convexity and compactness (Figure 2).

A crucial advance is the introduction of a bin-accretion algorithm with $\mathcal{O}(N \log N)$ scaling, leveraging precomputed Delaunay triangulations and heap-prioritized accretion of spaxels by local brightness or S/N. Both initialization and the ensuing CPD-based regularization operate efficiently, yielding dramatic efficiency gains over previous approaches (Figure 3).

Figure 3: Runtime comparison for classic VorBin and PowerBin, showing near $\mathcal{O}(N \log N)$ scaling for both accretion and regularization in PowerBin.

Empirical Evaluation: Robustness, Quality, and Generalization

PowerBin was evaluated on synthetic galaxies, background-limited mosaics, classic exemplar IFS datasets, and large-scale images with non-astronomical content.

On mock galaxy profiles with both additive (Poissonian) and non-additive (correlated noise, e.g., CALIFA-like modulation) capacity functions, PowerBin achieves high S/N uniformity and strict convexity, with bin shapes adapting smoothly to underlying signal gradients (Figure 4).

Figure 4: PowerBin results on mock galaxies with different Sersic indexes and noise models, showing uniform S/N and morphologically desirable bins.

Application to mosaics containing multiple galaxies in a noisy field demonstrates PowerBin's ability to initiate bins in multiple distant regions simultaneously, robust to negative or irregular capacities (Figure 5).

Figure 5: PowerBin tessellation for multi-object, background-limited galaxies; large bins in the background, tightly matched to target S/N elsewhere.

On canonical IFS observations, such as SAURON's NGC 2273, PowerBin slightly outperforms previous variants in S/N uniformity and delivers bins strictly adhering to convexity requirements (Figure 6).

Figure 6: PowerBin applied to SAURON IFS data of NGC 2273, reproducing and improving on earlier Voronoi-based approaches in both morphology and S/N dispersion.

PowerBin also generalizes outside astronomy, efficiently handling large-scale binning for artistic stippling, demonstrating the connection to blue-noise sampling distributions (Figure 7).

Figure 7: Distribution of $10^4$ PowerBin generators for a $512 \times 512$ binary image, illustrating the method's applicability to computer graphics and complex 2D domains.

Practical and Theoretical Implications

PowerBin stands as both a scalable, robust practical solution and as an instantiation of optimal transport theory in scientific binning. Its $\mathcal{O}(N\log N)$ complexity enables the routine analysis of million-pixel datasets, aligning method performance with future survey requirements. The strict convexity of bins resolves long-standing issues of morphological pathologies in weighted Voronoi binning, which is essential for applications where bin shape impacts downstream inference chains or modeling.

The method’s extensibility to non-additive, empirically-determined capacity functions directly addresses the hardest cases in real data reduction, such as correlated-noise modeling, and supports its adoption in current and future workflows across astronomy and other domains.

Moreover, the PowerBin approach—built on a physical-geometric analogy instead of gradient-based optimization—offers a generic template for implementing capacity-constrained tessellations in higher dimensions and with complex constraints, opening paths for use in spatial statistics, image analysis, graphics, and machine learning.

Conclusion

PowerBin introduces an algorithmically optimal, physically principled solution for adaptive binning, guaranteeing bin convexity, superior S/N uniformity, and scalability unconstrained by prior quadratic-scaling bottlenecks. Its efficacy and versatility on both synthetic and real data position it as a new baseline for capacity-constrained spatial partitioning in astronomy and beyond. Ongoing and future adoption will likely focus on integrating PowerBin in automated analysis pipelines, exploring its performance in 3D, and leveraging its framework for advanced capacity constraints in scientific and technical disciplines.

Reference: "PowerBin: Fast Adaptive Data Binning with Centroidal Power Diagrams" (2509.06903).

Markdown Report Issue