DisPerSE: Topological Structure Extraction
- DisPerSE is an open-source tool that robustly extracts topological features from noisy 2D and 3D datasets.
- It applies discrete Morse theory and persistent homology to segment scalar fields on structured and unstructured meshes, including Delaunay triangulations and HEALPix tessellations.
- DisPerSE efficiently identifies voids, walls, filaments, and peaks, ensuring topological stability and noise resilience in complex data analyses.
The DIScrete PERsistent Structures Extractor (DisPerSE) is an open-source computational tool designed for the automatic extraction and robust identification of topological structures—such as voids, peaks, walls, and filaments—from 2D and 3D noisy datasets. It is grounded in discrete Morse theory and persistent homology, enabling the segmentation of scalar fields defined on structured (regular grids, images) or unstructured data (Delaunay triangulations, HEALPix tessellations). Originally developed with cosmological applications in mind, DisPerSE is versatile across a range of domains and manifolds, and can be directly applied to paper the topology of sampled functions, including persistent Betti number computations.
1. Theoretical Underpinnings
DisPerSE is based on two critical frameworks from computational topology and geometry: discrete Morse theory and persistent homology.
1.1 Discrete Morse Theory in DisPerSE
Given a simplicial or cubical complex representing the domain , a discrete Morse function assigns scalar values respecting:
- For each simplex :
- At most one coface with .
- At most one face with .
A simplex is critical if neither such (co)face exists, and the dimension gives its index (minima, various types of saddles, and maxima, depending on dimensionality). Discrete integral lines—paths along gradient arrows in the cell complex—pair every noncritical simplex with a neighbor; collections of these paths form ascending and descending manifolds associated with each critical simplex. The domain is thereby partitioned into the Morse–Smale complex cells, defined by intersections of ascending and descending manifolds.
1.2 Persistent Homology and Robustness
Persistence assigns a measure of significance to topological features by pairing critical simplices corresponding to the birth (, at value ) and death (, at value ) of features:
Features with persistence below a chosen threshold () can only be created or destroyed by perturbations in of amplitude smaller than , permitting robust filtration of noise-induced features. This mechanism provides provable stability: the segmented structures become insensitive to fluctuations of magnitude less than .
2. Algorithmic Process
The DisPerSE algorithm proceeds through a systematic sequence:
- Input and Mesh Construction:
- For discrete point sets: build a Delaunay triangulation, with scalar field evaluation (e.g., via DTFE).
- For regular images: construct a cubical grid.
- For data on the sphere: utilize HEALPix tessellation.
- Evaluate scalar field values on mesh vertices.
- Construction of the Discrete Morse Complex:
- Sort all simplices by , assigning each simplex to at most one matching face or coface, per the discrete Morse matching rule.
- Unpaired simplices are marked critical, with index recorded.
- Extraction of Ascending and Descending Manifolds:
- For each critical simplex, flood out its ascending/descending manifold via gradient arrows.
- Morse–Smale Complex Assembly:
- Form intersections of all ascending and descending manifolds to produce the Morse–Smale cells. In particular, 1-cells (arcs) connect critical saddles to maxima or minima.
- Persistence Computation and Simplification:
- Identify persistence pairs by matching feature birth and death events during a threshold sweep.
- Compute persistence for each pair, discard (cancel) pairs with ; each cancellation involves local gradient reconnection within the cell complex.
The following pseudocode outlines the process as presented:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
input: point set or image → build mesh K, evaluate f on vertices
sort simplices of K by f
for σ in simplices in increasing f:
attempt discrete-Morse-match(σ) # at most one face or coface
if unmatched then mark σ as critical
compute discrete-gradient-arrows from every noncritical simplex
for each critical σ:
flood-descending-manifold(σ)
flood-ascending-manifold(σ)
intersect all (ascending of σ_i) × (descending of σ_j) → Morse–Smale cells
persistence_pairs = pair_critical_simplices_via_sweep()
for each (σ⁺,σ⁻) in persistence_pairs:
Δ = |f(σ⁻) – f(σ⁺)|
if Δ < Δ_th:
cancel_pair(σ⁺,σ⁻) # local reconnection of gradient arrows
output: simplified Morse–Smale complex |
3. Topological Structure Extraction
With the simplified Morse–Smale complex, DisPerSE identifies topological features:
- Voids: Connected components of the union of ascending 3-manifolds originating from minima (index-0 critical points), interpreted as 3D bubbles.
- Walls (Sheets): Ascending 2-manifolds from index-1 saddles, acting as 2D separatrices between voids.
- Filaments: 1-cells (arcs) connecting index-2 saddles to maxima (index-3), representing the dense ridges of the field (such as the cosmic web).
- Peaks (Clusters/Haloes): Locations of maxima.
These correspondences have homological interpretations—e.g., new voids increment (connected components), walls and filaments modulate higher Betti numbers (, etc.). Practical extraction does not require explicit computation of Betti numbers; indices and persistences assigned to critical simplices suffice for direct manifold identification.
4. Implementation Details and Complexity Analysis
Data Structures
- Unstructured Meshes: Delaunay triangulations ( construction, memory ).
- Regular Grids: Utilize implicit cubical complexes, where no explicit cell list beyond the array is maintained.
- HEALPix Tessellations: Implicitly defined equal-area pixelizations for spherical domains.
Complexity
- Discrete Morse Matching: per simplex, with overall cost dominated by sorting (), number of simplices.
- Manifold Extraction: , being the critical cell count.
- Persistence Computation: Union-find or sweep methods yield complexity.
- Memory: Dominated by mesh storage (), with gradient pointers per noncritical cell.
For astrophysical data up to points or cells, DisPerSE’s memory usage and performance (tens of GB RAM, runtime of a few hours) are practical on modern workstations.
5. Scientific Applications
DisPerSE has seen application across diverse astrophysical and cosmological contexts:
- N-body Simulations: Applied to point sets (dark matter or galaxy particles), recovering 3D filaments (the "cosmic web skeleton"), walls, and voids. Maxima identified with dark-matter haloes; visualization overlays filaments and critical points on 2D field slices.
- Cosmic Microwave Background (CMB) Analysis: Temperature maps on the sphere (HEALPix) yield hot-spot filaments and cold-spot basins, segmenting the full-sky signal.
- Star-Forming Regions: Noisy Herschel far-infrared data processed to isolate interstellar filaments; low-persistence spurious features are filtered by associating with the estimated noise amplitude.
DisPerSE outputs fully segmented topological manifolds—voids, walls, filaments, and peaks—and the topology is provably faithful to the input field up to the specified noise threshold. Persistent homology ensures that features with persistence above are resilient against sampling artifacts and random fluctuations; explicit numerical accuracy benchmarks are not presented, but these guarantees are a consequence of the fundamental theory.
6. Summary and Significance
DisPerSE integrates discrete Morse theory and persistent homology into an automatic pipeline for the segmentation of 2D and 3D scalar fields, without requiring tuning of multiple parameters. The principal control, the persistence threshold (), offers direct adjustment of noise sensitivity versus structure fidelity. Throughout, the segmentation process maintains linear or log-linear computational complexity in the number of cells and ensures topological stability of the identified structures. This framework supports a wide variety of scientific analyses wherever topological structure extraction from scalar datasets is required.