Papers
Topics
Authors
Recent
Search
2000 character limit reached

CUBISM Code: Astronomical & ML Regularization

Updated 21 December 2025
  • CUBISM Code is a computational framework that regularizes sparse astronomical fibre-sampling data into high-quality datacubes and enables unsupervised domain adaptation in action recognition.
  • In astronomy, it adapts the drizzle algorithm to combine dithered exposures, preserving up to 90% of covariance information for robust error analysis.
  • In machine learning, it leverages temporal and spatial permutation tasks to improve domain generalization in skeleton-based action recognition.

CUBISM Code refers to a class of computational methodologies and software systems developed for two distinct research domains: (1) the regularisation and combination of fibre-optic integral field spectroscopy (IFS) data in astronomy, as applied in the context of the Sydney-AAO Multi-object Integral-field spectrograph (SAMI) Galaxy Survey, and (2) self-supervised learning for domain adaptation in skeleton-based action recognition through temporal and spatial data permutation tasks. The common thread is the decomposition and reassembly of input data—either astronomical spectra or skeleton sequences—into regularised structures that preserve critical relationships or enhance generalisation. The following entry surveys both the original CUBISM implementation in astronomical datacube construction (Sharp et al., 2014), and its algorithmic appropriation for unsupervised domain adaptation in computer vision (Tang et al., 2022).

1. Objectives and Design Principles

In the context of astronomical IFS data, the CUBISM code was designed with strict criteria to address the complexities of sparse, irregular fibre sampling. Its primary objectives are as follows:

  • Regularise and combine SAMI’s sparse, irregularly gridded fibre–bundle IFS observations into Cartesian datacubes with minimal interpolation blur, thus preserving the native image resolution as much as possible despite the fill-factor limitations of the SAMI hexabundles and finite fibre core sizes (73% fill-factor, 1.6″ core diameter) [(Sharp et al., 2014), Sections 1 & 5].
  • Accurately quantify and record the covariance structure induced by the resampling process. The methodology is constructed to preserve approximately 90% of the underlying covariance information while keeping the survey data volume increase modest [(Sharp et al., 2014), Section 5.6].

In the machine learning context, "Cubism" tasks exploit the analogy of shuffling and recombining components inspired by the art genre. The objectives are:

  • Design self-supervised learning strategies that enhance domain adaptation capability for skeleton-based action recognition, by breaking and recombining temporal and spatial structure in input skeleton representations and training classifiers to recover the applied permutations (Tang et al., 2022).

2. Input/Output Formats and Data Handling

Astronomical Application

CUBISM for the SAMI survey ingests and outputs the following:

Format Description Dimensionality/Content
Input-RSS Row-Stacked Spectra (RSS) frames from 2dFdr; 819 1D fibre spectra per file Flux, variance arrays, tramline maps, wavelength solutions
Intermediate Flux-calibrated and telluric-corrected RSS, converted to per-exposure datacube “tiles” Per-exposure cubes
Output Mosaiced datacube C[x,y,λ]C[x,y,\lambda] on a 0.5 ⁣ ⁣×0.50.5'' \!\! \times 0.5'' spaxel grid, plus V[x,y,λ]V[x,y,\lambda], W[x,y,λ]W[x,y,\lambda] and COVAR[5,5,x,y] 3D datacube, variance, weight, compressed 5×5 covariance kernel

The COVAR extension efficiently encodes local 5×5 covariance kernels sampled at selected wavelength slices.

Machine Learning Application

ST-Cubism processes skeleton-based video data represented as 4D tensors:

  • Each skeleton video stored as a C×T×V×MC \times T \times V \times M tensor, with CC coordinates (2D/3D), TT frames, VV joints, MM persons [(Tang et al., 2022), §3].
  • Input data organized as .npy arrays; normalization (centering, rotation, scaling) and augmentations (random flipping, noise) are applied.
  • Output includes permutation prediction (for self-supervised tasks) and target action class probabilities.

3. Methodological Framework and Algorithms

Astronomical Cubism: Drizzle-based Cube Construction

  • Adopts the Drizzle algorithm (Fruchter & Hook 2002), adapted for handling hexabundle fibre data. Each fibre core is projected onto the target grid with flux assigned to each output spaxel according to its fractional overlap area αi(r)\alpha_i(r).
  • For multiple dithered exposures, NN cube/variance/weight triplets {Cn,Vn,Wn}\{C_n, V_n, W_n\} are stacked by weighted mean at each (r,λ)(r,\lambda):

Cout(r,λ)=nCn(r,λ)Wn(r,λ)nWn(r,λ)C_\text{out}(r,\lambda) = \frac{\sum_n C_n(r,\lambda)W_n(r,\lambda)}{\sum_n W_n(r,\lambda)}

Vout(r,λ)=nVn(r,λ)Wn2(r,λ)[nWn(r,λ)]2V_\text{out}(r,\lambda) = \frac{\sum_n V_n(r,\lambda)W_n^2(r,\lambda)}{[\sum_n W_n(r,\lambda)]^2}

Outlier rejection is handled via 5σ5\sigma clipped-mean statistics [(Sharp et al., 2014), Section 5.5].

ST-Cubism: Temporal and Spatial Permutation Self-Supervision

  • Temporal Cubism: Splits the input skeleton sequence into NN equal temporal segments along TT, applies a random permutation πSN\pi\in S_N, and trains an N!N!-way classifier to recover π\pi. Loss: standard cross-entropy over N!N! classes [(Tang et al., 2022), §4.1].
  • Spatial Cubism: Joints are grouped into five primary parts. A block-permutation matrix PspaP_{\text{spa}} reorders/switches arms or legs, and a 3-class classifier must recover the applied spatial permutation [(Tang et al., 2022), §4.2].
  • Total loss accumulates domain classification (LclsL_{cls} on source sample labels) and Cubism permutation classification (LtempL_{temp} or LspatL_{spat}) with weight λt0.1\lambda_t\approx 0.1 or λs0.1\lambda_s\approx 0.1 [(Tang et al., 2022), §4.3].

4. Covariance Quantification and Compression

  • In drizzled datacube construction, local covariance arises due to the overlap of fibre footprints on the output grid. The covariance between two pixels at wavelength λ\lambda is

Σ(i,j;λ)=αiαjσ02\Sigma(i, j; \lambda) = \alpha_i \alpha_j \sigma_0^2

where σ02\sigma_0^2 is the variance from the original fibre [(Sharp et al., 2014), Section 5.6].

  • Rather than storing the full [2048×(50×50)2][2048 \times (50 \times 50)^2] covariance tensor, CUBISM compresses the information as 5×55 \times 5 covariance kernels sampled every \sim100 pixels in λ\lambda and at each atmospheric dispersion correction pivot. These are normalized and stored compactly as the COVAR extension. On-the-fly interpolation recovers the effective covariance matrix for scientific error propagation. This captures 80%\gtrsim 80\% of covariance power for nearly all spaxels.

5. Implementation Structure and Workflow

Component Functionality
rss2cube.py Reads, WCS-aligns, drizzles, and outputs per-exposure (C, V, W) cubes
align_dithers.py Measures IFU offsets, aligns exposures via IRAF geomap/geoxytran
stack_cubes.py Weighted combination, outlier rejection, mosaicing, output of C, V, W, COVAR extensions
build_covar.py Calculates and compresses 5×5 covariance kernels, writes COVAR headers

CUBISM is written in Python with dependencies on numpy, scipy, astropy (for FITS I/O), and IRAF PyRAF (for astrometric alignment) [(Sharp et al., 2014), Section 5.3].

For ST-Cubism:

  • Key modules: dataset.py (data I/O, normalization, Cubism transforms), transforms.py (Cubism permutation logic), model.py (GCN backbone and classification heads), and trainer.py (training loop, checkpointing, logging).
  • Configuration is handled via YAML files specifying data structures and hyperparameters.
  • Training and evaluation scripts support monitoring via TensorBoard, and final inference fuses prediction streams from both Cubism heads (Tang et al., 2022).

6. Performance Metrics and Best Practices

Astronomical Application

  • The 7-position hexagonal dither with 0.45×0.45\times fibre-diameter pitch yields a uniform weight-map with σ(W)/W5%\sigma(W)/\langle W \rangle \lesssim 5\% over central regions.
  • Drizzle drop-size set to half the core ($0.8''$) enables 90%\gtrsim 90\% recovery of seeing-limited resolution; empirical PSF FWHM degradation is consistently 0.2\leq 0.2'' [(Sharp et al., 2014), Section 7].
  • Compressed covariance reduces per-target cube storage from \sim1 GB (full) to \sim170 MB, reducing wall clock time from \sim100 minutes to \sim10 minutes per cube (on survey hardware).

Recommended operational best practices: adopt the 7-point dither; standardise drizzle drop-size to $0.5$ (core units); propagate and consult the weight map for flux-correct mosaics; always retain COVAR for error analysis; and filter poor-seeing frames prior to stack construction (Sharp et al., 2014).

Machine Learning Application

  • Standard hyperparameter regimes are: batch size of 32, 400 epochs training, SGD with momentum 0.9, learning rate 0.01 (decayed at epochs 200 and 300), Cubism loss weight 0.1; permutation class counts of 6 (temporal) and 3 (spatial) (Tang et al., 2022).
  • Fused accuracy is computed as the mean (or weighted mean) of softmax outputs from temporal and spatial Cubism classifiers.
  • The codebase supports reproducibility and extensibility for new datasets and experimental regimes—ensuring correct tensor formats and normalization is essential for reliable results.

7. Broader Significance and Applications

The original CUBISM code has established the standard for survey-quality IFS data reduction in systems with sparse, dithered fibre sampling, demonstrating minimal loss in spatial resolution and providing a rigorous framework for error propagation via retained covariance information (Sharp et al., 2014). Its application within the SAMI survey set methodological precedents adopted by similar astronomical campaigns.

The ST-Cubism paradigm has influenced unsupervised domain adaptation by introducing Cubism-inspired permutation tasks that enable domain-robust feature learning without adversarial training mechanisms (Tang et al., 2022). By directly leveraging temporal and spatial structure in skeleton data, these strategies facilitate improved generalisation in cross-dataset action recognition scenarios, as validated through large-scale benchmarks (NTU RGB+D, PKU-MMD, Kinetics).

Both paradigms demonstrate how the Cubism principle—fragmenting and recombining data structures—can be generalized for robust regularisation and representation learning in disparate scientific and machine learning domains.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CUBISM Code.