Asteria: Multi-Domain Research Systems

Updated 3 July 2026

Asteria is a family of systems and methodologies that address challenges in astrophysics, machine learning, information systems, and security analysis.
ASTERIA has enabled innovations such as NEA thermal inertia estimation via Yarkovsky drift inversion and high-precision CubeSat exoplanet photometry.
Deep learning, runtime optimization, and semantic caching under the Asteria banner deliver cross-platform code similarity detection, efficient LLM training, and resilient semantic retrieval.

Asteria refers to a family of methods, systems, and software architectures that span diverse domains in astrophysics, machine learning, information systems, and security analysis. Several research communities have independently introduced frameworks under the name “Asteria,” each defined by its unique technical context and objectives. Below is an authoritative technical summary of key Asteria systems in contemporary research.

1. Asteroid Thermal Inertia Analyzer (ASTERIA): Yarkovsky-Based Inference

ASTERIA (“Asteroid Thermal Inertia Analyzer”) is a methodology to estimate the surface thermal inertia (Γ) of small Solar System bodies, particularly near-Earth asteroids (NEAs), by quantitatively inverting the observed Yarkovsky drift in the orbital semi-major axis (Novakovic et al., 2024, Fenucci et al., 18 Feb 2025). Unlike the prevalent thermophysical modeling (TPM) approach—which fits multi-wavelength infrared fluxes to deduce surface properties—ASTERIA uses dynamical orbit solutions incorporating non-gravitational acceleration parameters, such as $A_2$ , obtained via high-precision astrometry. The method is summarized as follows:

Physical Model: The Yarkovsky effect induces a measurable along-track (transverse) acceleration due to anisotropic thermal re-emission. For a rotating ellipsoid, this is typically parameterized by $A_2$ (in astronomical units per day squared). The secular drift in semi-major axis ( $\dot a$ ) therefore satisfies:

$\dot a = \frac{2A_2 a^2}{n_0}$

(where $a$ is semimajor axis, $n_0$ mean motion). $A_2$ is a function of surface thermal inertia and other physical parameters: $A_2 = f(\Gamma,\rho,C,\epsilon,a,b,c,P,\gamma,e,...)$ .

Numerical Inversion: ASTERIA solves for $\Gamma$ $Γ$ by:
1. Sampling input parameters (e.g., $a$ , $A_2$ 0, $A_2$ 1, $A_2$ 2, $A_2$ 3, $A_2$ 4, $A_2$ 5, $A_2$ 6) via Monte Carlo, reflecting their measurement uncertainties or population priors.
2. Computing the predicted $A_2$ 7 for a grid of $A_2$ 8 using a semi-analytical 1D heat conduction-Yarkovsky recoil model.
3. Matching modeled $A_2$ 9 to the observed value (e.g., from JPL solution sets).
4. Aggregating the resulting posterior distribution for $\dot a$ 0.
Model Flexibility: ASTERIA permits both constant and variable $\dot a$ 1 models along the heliocentric distance, e.g., $\dot a$ 2 where theory predicts $\dot a$ 3 for radiatively-conducting regolith.
Applications: The method has been applied to asteroids including (65803) Didymos and (469219) Kamo`oalewa, yielding robust inferences of $\dot a$ 4 in the regime $\dot a$ 5– $\dot a$ 6 J m $\dot a$ 7 K $\dot a$ 8 s $\dot a$ 9, in concordance with or extending beyond TPM-derived values (Novakovic et al., 2024, Fenucci et al., 18 Feb 2025).
Advantages/Limitations: ASTERIA’s main strength is applicability to small, fast-rotating objects where classical TPM lacks sufficient IR data. It is, however, less precise than IR-based models and sensitive to the accuracy of measured $\dot a = \frac{2A_2 a^2}{n_0}$ 0 and other inputs. The method is currently limited to NEAs; main-belt applications await further Yarkovsky detections.

2. ASTERIA in Binary Code Similarity and Vulnerability Search

2.1 ASTERIA: Deep Learning for Cross-Platform Binary Similarity

In binary code similarity detection, Asteria denotes a deep learning system designed for semantic equivalence assessment across architectures in IoT and firmware binaries (Yang et al., 2021, Yang et al., 2023). The principal features are:

Semantic Encoding via ASTs: Functions are decompiled to abstract syntax trees (ASTs) using Hex-Rays, with each node digitized and embedded. ASTs abstract away platform-specific instruction variability but retain semantic structure, offering stability across differing ISAs.
Tree-LSTM Embedding: ASTs are encoded using a binary Tree-LSTM, which propagates vectorial hidden states bottom-up through the tree. The embedding at the root captures the function’s semantics. Node update equations follow:

$\dot a = \frac{2A_2 a^2}{n_0}$ 1

Siamese Similarity Structure: Two ASTs are passed to parameter-sharing Tree-LSTM arms; the absolute difference and Hadamard product of embeddings are passed to a classifier head, yielding softmax probabilities for similarity.
Empirical Performance: ASTERIA achieves cross-architecture ROC-AUC $\dot a = \frac{2A_2 a^2}{n_0}$ 2, outperforms Diaphora by $\dot a = \frac{2A_2 a^2}{n_0}$ 3 and Gemini by $\dot a = \frac{2A_2 a^2}{n_0}$ 4, and is orders of magnitude faster in similarity search.

2.2 ASTERIA-Pro: Incorporating Domain Knowledge

Asteria-Pro augments the original pipeline with domain-knowledge-based pre-filtration (using named callees, caller relationships, and string constants) and call-graph–aware re-ranking (Yang et al., 2023). This stratified approach yields a $\dot a = \frac{2A_2 a^2}{n_0}$ 5 reduction in runtime and boosts MRR and Recall@1 by $\dot a = \frac{2A_2 a^2}{n_0}$ 6 and $\dot a = \frac{2A_2 a^2}{n_0}$ 7, respectively. Experimental runs demonstrated detection of $\dot a = \frac{2A_2 a^2}{n_0}$ 8 vulnerable firmware functions with $\dot a = \frac{2A_2 a^2}{n_0}$ 9 precision.

3. ASTERIA for LLM Training: Runtime-Orchestrated Second-Order Optimization

In large-scale language modeling, Asteria refers to a runtime system that enables principled, sample-efficient, second-order optimization (e.g., Shampoo, SOAP, KL-Shampoo) for LLMs by rearchitecting systems-level state management (Lu et al., 15 May 2026):

Critical Path Decoupling: Asteria dynamically stages preconditioner state (factor and inverse-root matrices) across GPU, host RAM, and NVMe based on runtime pressure, using memory hooks (register_forward_hook, register_full_backward_pre_hook) in PyTorch FSDP.
Asynchronous Inverse-Root Computation: Expensive eigen-decomposition and inverse-root operations are offloaded to CPU worker pools and staged back to GPU asynchronously, smoothing $a$ 0 latency spikes off the GPU critical path.
Bounded-Staleness Protocol: In distributed settings, a coherence registry tracks staleness per factor block; inter-node synchronizations are invoked only when staleness exceeds a bound $a$ 1, reducing communication volume and GPU stalls.
Empirical Impact: On single GB10 GPUs (128GB UVM), enables 1B-parameter training without OOM. On GH200 clusters, achieves convergence and throughput similar to AdamW, but with improved time-to-solution and energy efficiency.

4. ASTERIA for Semantic-Aware Caching in LLM Agents

In multi-region LLM agent workloads, Asteria denotes a semantic-aware, cross-region knowledge caching architecture (Ruan et al., 22 Sep 2025):

Semantic Elements (SEs): Each cache entry is a tuple of (query, embedding, value, performance metadata). Staticity and frequency metadata inform cache management.
Semantic Retrieval Index (Sine): Implements a two-stage retrieval process: (1) approximate nearest neighbor (ANN) vector similarity filtering, followed by (2) LLM-based semantic “judger” for precision validation.
Cost-Efficient and Latency-Aware Policies: Asteria introduces the “LCFU” eviction policy (Least Cost-Efficient & Frequently Used) and history-based Markov predictive prefetching to maximize value and reduce cold misses.
Co-Location and Scheduling: The main agent LLM and embedded semantic judger LLM share GPU with asymmetric static partitioning and priority-aware dynamic scheduling to avoid interference and optimize resource utilization.
Performance: Achieves $a$ 2 throughput gains on representative search/coding workloads, maintains cache hit rates $a$ 3, and lowers API and GPU costs versus vanilla and exact-match caches.

5. ASTERIA: Dark Matter Capture Rate Computation Framework

Asteria also denotes an open-source software package for multi-regime dark matter (DM) capture in celestial objects (Python and Mathematica) (Leane et al., 2023):

Unified Analytic-Numerical Framework: Incorporates both the optically thin (“single-scatter”) and thick (“multi-scatter”) regimes, covering arbitrary DM mass $a$ 4 and cross section $a$ 5, and accommodates Earth, Sun, Jupiter, brown dwarfs and arbitrary spherical bodies.
Gould/Press-Spergel Foundation with Modern Extensions: Supports summations over all scatter multiplicity, light-DM diffusive corrections, strong-interaction reflections, and stopping-power limits for ultra-heavy DM.
Interface and Usage: Provides simple APIs for bodies (Earth, Sun, etc.), DM parameters, and velocity distributions to compute total capture rates $a$ 6 with sub-percent numerical accuracy.
Validation and Integration: Benchmarks against full Monte Carlo, supports indirect detection forecasting, exoplanet heating, and direct/indirect detection phenomenology.

6. ASTERIA CubeSat: High-Precision Exoplanet Photometry

ASTERIA (“Arcsecond Space Telescope Enabling Research In Astrophysics”) is a 6U CubeSat telescope designed to demonstrate milliKelvin-level thermal and subarcsecond pointing stability for photometric observations (Knapp et al., 2020):

Bus and Payload: 6U form factor; optical payload (f/1.4, 85mm lens), 5.5MP CMOS sensor, and two-stage attitude control with piezoelectric stage for $a$ 7 RMS stability over 20 min, and camera plate thermal regulation to $a$ 8 mK.
Science Operations: Performed opportunistic photometry on 55 Cnc, detecting a $a$ 9 ppm transit of 55 Cancri e—first exoplanet transit detection by a CubeSat.
Implications: Validates the feasibility of high-precision photometry and transit science from low-cost CubeSats, with implications for exoplanet detection, variable star monitoring, and future small-sat astrophysics missions.

7. Summary Table of ASTERIA Incarnations

Subfield / Domain	Purpose / Role	Primary Reference
NEA Thermal Inertia Analysis	Yarkovsky-inversion for $n_0$ 0	(Novakovic et al., 2024, Fenucci et al., 18 Feb 2025)
Binary Code Similarity / Security	Semantic (AST+TreeLSTM) code embedding & search	(Yang et al., 2021, Yang et al., 2023)
LLM Optimization Runtime	Systems offload for scalable second-order training	(Lu et al., 15 May 2026)
LLM Agent Semantic Cache	Cross-region, semantically validated tool caching	(Ruan et al., 22 Sep 2025)
Dark Matter Capture (Astrophysics)	Multi-regime capture rates in celestial bodies	(Leane et al., 2023)
CubeSat Astrophysics	High-precision photometry from a 6U platform	(Knapp et al., 2020)

Each ASTERIA system is independent and context-specific, unified only by naming convention and a focus on scalable, robust, and semantically-informed analysis or computation in challenging regimes.