PRIMRose: Computing, Astronomy, and Protein Energetics

Updated 13 December 2025

PRIMRose for container selection uses formal property specifications and SMT solving to filter and rank implementations based on syntactic and semantic criteria.
The PRIMRose survey framework synergizes data from NASA’s Roman and PRIMA telescopes to achieve unbiased galaxy property estimation at cosmic noon.
In protein energetics, PRIMRose employs a deep convolutional network to predict residue-level energy impacts from double InDel mutations using extensive in silico datasets.

PRIMRose is a designation used for three unrelated entities in contemporary research: (1) a solver- and verification-backed tool for selecting container data types by their properties in programming languages, (2) a multi-observatory survey framework combining NASA's Roman and the proposed PRIMA telescopes to recover galaxy physics at cosmic noon, and (3) a deep learning system for predicting per-residue energy effects of double InDel protein mutations. Each instantiation of PRIMRose targets a distinct field and employs different methodologies, but all demonstrate characteristic rigor in specification, modeling, and data analysis.

1. PRIMRose for Container Data Type Selection

Container data types (e.g., sets, stacks, lists) are foundational in modern programming languages and are generally realized via multiple concrete implementations (red-black trees, hash tables, linked lists, etc.). Application programming interfaces (APIs) force developers to select a particular implementation, even when their requirements are better understood in terms of abstract data properties—such as uniqueness, ordering, or last-in, first-out (LIFO) semantics. The PRIMRose tool addresses this misalignment by enabling property-first development: developers express desired syntactic and semantic properties as type refinements, and PRIMRose automatically filters and ranks valid implementations based on both formal conformance and empirical performance (Qin et al., 2022).

Formalization within PRIMRose distinguishes:

Syntactic properties: Sets of operations required by the container (e.g., methods expressed as Rust traits).
Semantic properties: Behavioral predicates specified as refinements on an abstract container model (i.e., predicates quantifying over all elements, pairs, or operation results).

Properties are encoded against a model where each candidate implementation is abstracted to a list-based structure, providing a uniform foundation for expressing properties such as uniqueness: $\mathtt{property}\;\mathit{unique}\;=\;\lambda c.\;\mathit{for\text{-}all\text{-}elems}\;c\;(\lambda x.\;\mathit{unique\_count}(x,c))$

Verification leverages SMT solving (via Rosette): for each operation, PRIMRose checks that desired properties are observationally preserved post-operation using bounded model-checking, ensuring both soundness and practical decidability.

The selection workflow consists of syntactic trait filtering, semantic property verification, and runtime benchmarking to identify the implementation that not only satisfies correctness criteria but is also the fastest for the workload and data under consideration. Evaluation across eight Rust container types, multiple property specifications, and property-based testing shows selection times under 30 seconds for realistic libraries, with tighter properties (e.g., ascending, unique) requiring more solver effort.

Though the prototype is built for Rust, the property and verification architecture is language-agnostic, requiring only support for trait/interface or type class abstraction and extension with predicate annotations. Limitations include coverage only for sequential containers (with maps as an extensible future direction), a type-system assumption of total ordering where required by properties, and currently naive runtime performance metrics. Fully mechanized correctness proofs and more sophisticated ranking models represent further research directions.

2. PRIMRose as a Synergy of Roman and PRIMA Space Missions

PRIMRose also denotes the synergy between the Nancy Grace Roman Space Telescope and the PRobe far-InfraRed Mission for Astrophysics (PRIMA) in extragalactic astronomy. Roman's Wide Field Instrument (WFI) delivers deep, wide-area near-infrared photometry, while PRIMA provides far-infrared imaging and spectroscopy with a 1.8 m cryogenic telescope. Combined, they yield self-consistent measurements of physical parameters for high-redshift galaxies ( $1.5 \lesssim z \lesssim 2.5$ ) that are unattainable by either facility alone (Boquien et al., 1 Sep 2025).

The combined PRIMRose framework employs:

Roman: Optical/near-IR, seven WFI bands (F106, F129, F158, F184 used in simulations), depths 13–24 nJy, covering up to $\sim2000\, \mathrm{deg}^2$ .
PRIMA: Far-IR photometry (24–261 μm, via multiple channels and filters), low- and high-resolution spectroscopy for FIR fine-structure lines, survey depth from $\sim$ 35 μJy to 800+ μJy depending on band and tier.

Mock catalogs generated with CIGALE model galaxies across redshift and mass bins, tying star-formation histories, attenuation, and emission to empirically calibrated templates and noise models. The parameter recovery process involves forward modeling of spectral energy distributions (SEDs) under a Bayesian framework, yielding robust posteriors for:

Stellar mass ( $M_*$ ): Constrained primarily by Roman's rest-NIR photometry.
Star-formation rate (SFR): Requires both unobscured UV luminosity (Roman) and dust-reprocessed IR (PRIMA), with total SFR via UV+IR energy balance.
Dust luminosity ( $L_\mathrm{dust}$ ): Recovered from PRIMA far-IR photometry.
PAH fraction ( $q_\mathrm{PAH}$ ): Inferred from PRIMA's mid-to-far-IR sensitivity.

Quantitative analysis demonstrates that only the combined PRIMRose data achieves unbiased and precise ( $\sim$ 0.05–0.10 dex scatter) estimates of all key quantities. Roman alone yields accurate $M_*$ but weak SFR and $L_\mathrm{dust}$ (due to dust–mass degeneracies), while PRIMA alone cannot break the mass–dust degeneracy. The only route to energy-balanced, unbiased parameter sets at $1.5 \lesssim z \lesssim 2.5$ is joint photometry.

Survey optimization strategies recommend a two-tiered program (deep and wide) to deliver both high-fidelity SEDs and statistically significant population samples at cosmic noon. This enables constraining star-forming main sequence scaling, dust mass functions, and PAH evolution with minimal bias and cosmic variance.

3. PRIMRose: Deep Learning for Protein Mutation Energetics

In computational biology, PRIMRose refers to a convolutional neural network (CNN) architecture for predicting residue-level energy perturbations ( $\Delta E_i$ ) resulting from double amino acid insertions or deletions (InDels) in proteins (Brown et al., 6 Dec 2025). Whereas traditional approaches estimate global fold stability or total energy for a mutated sequence, PRIMRose outputs a $|\mathrm{sequence}| \times 14$ tensor, decomposing the total Rosetta energy and its major physical contributions for each residue.

The input is a mutated FASTA sequence processed via a 64-dimensional amino acid embedding. The CNN stacks 30 residual blocks with one-dimensional convolutions (kernel size 3), no pooling or positional encodings, preserving length and order. The output layer projects this to 14 per-residue terms covering van der Waals (fa_atr, fa_rep), solvation (fa_sol, fa_intra_sol, lk_ball_wtd), electrostatics (fa_elec), local statistical potential (fa_dun, rama_prepro, p_aa_pp), backbone geometry (pro_close, omega), and a total energy composite.

Training data is constructed via in silico mutation and energy computation for nine PDB proteins, using exhaustive or large-scale random double InDel sampling (up to $\sim10^6$ variants per protein). Data splits account for generalization across both unseen residue positions and amino acid identities. Models are trained with Adam optimizer, Huber loss, and Z-score normalization per energy term.

Evaluation uses Pearson $r$ , RMSE, Spearman's $\rho$ , and MAE on test sets. Local physics-derived energy terms achieve $r>0.98$ on large datasets; geometry-sensitive or highly nonlocal terms achieve $r\approx0.56$ –0.77. Analysis by secondary structure assignment (via DSSP) shows insertions into $\beta$ -sheets yield the highest prediction error, followed by helices and coils; solvent accessibility is less predictive of error variance.

Exemplars illustrate per-residue interpretation: double insertions in loop or solvent-accessible regions induce minimal energy perturbation (mutational tolerance), but core insertions, especially in $\beta$ -sheets, generate sharp $\Delta E_i$ peaks identifying local folding or stability hotspots.

4. Technical Methodologies and Verification Approaches

The various PRIMRose systems converge on an integrated pipeline combining clear property or specification formalization, automated verification or modeling, and empirical (either computational or observational) performance ranking.

Type and property refinement (container selection): Precise abstractions support fine-grained behavioral specification and drive SMT-based filtering. Properties are written as functionals on containers, interpreted in a bounded abstract model, with Rosette automating invariant or postcondition checking.
End-to-end likelihood modeling (extragalactic physics): Mock population synthesis, full SED modeling, and noise-aware flux perturbation enable forward-modeling of survey observables and parameter posterior estimation. Energy and mass balance constraints are encoded in model grids and likelihood functions.
Deep convolutional architectures with explicit physical targets (protein energetics): Network design directly reflects physical and sequence structure, bypassing the need for explicit topological or positional encodings and operating at residue-level granularity. Multiple error and correlation metrics ensure interpretability and robustness across structurally heterogeneous sites.

5. Practical Limitations and Future Directions

Each PRIMRose instantiation identifies limits and open avenues:

Container selection: Generalizes across programming environments but currently excludes map/dictionary types and relies on simple total ordering assumptions where required. Performance benchmarking is naive, inviting future integration of cache, memory, and system-specific predictors, or more precise correctness proofs (e.g., via RustBelt formal methods) (Qin et al., 2022).
Survey synergy: Dependent on true mutual coverage and cross-matched catalog depth between Roman and PRIMA. Improvements may target photometric redshift accuracy via deeper optical fields or expanded FIR line coverage for ISM diagnostics (Boquien et al., 1 Sep 2025).
Protein mutation energetics: Largest generalization errors remain for nonlocal, complex geometry-sensitive energy terms; extension to more diverse protein folds, larger proteins, or context-aware architectures is indicated. Incorporation of additional 3D/geometric priors may further improve performance (Brown et al., 6 Dec 2025).

6. Significance and Cross-Disciplinary Impact

Despite the lack of direct connection among the three threads, PRIMRose exemplifies contemporary trends in:

Raising abstraction in software and data analysis workflows (via formal specification, property-driven synthesis, and correctness-aware selection).
Integrating disparate observational facilities for maximal physical inference—joint photometry, as in extragalactic astrophysics, removes degeneracies unbreakable by single-instrument approaches.
Translating simulation-informed physical insight into machine learning workflows that extract local, interpretable metrics, advancing the granularity and utility of computational biology predictions.

In each context, PRIMRose leverages rigorous formalism, comprehensive verification, and performance-centered filtering, supporting reproducible, high-confidence analysis and design.