Quokka: Multi-Domain Computation
- Quokka is a suite of computational frameworks that spans materials science NLP, diffusion scaling laws, GPU hydrodynamics, and fault-tolerant distributed analytics.
- It employs advanced methods such as continued pretraining with state-of-the-art GPU acceleration, adaptive mesh refinement, and write-ahead lineage for SQL queries.
- Key contributions include closed-form scaling laws for diffusion models, robust astrophysical RHD simulations, and innovative data recovery protocols that drive cross-domain research.
Quokka refers to a diverse set of computational and machine learning frameworks, simulation codes, and applied benchmarks that appear in several high-impact research domains, notably in LLMs, high-performance computing for astrophysics and hydrodynamics, materials science NLP, diffusion model scaling theory, and distributed data analytics. All these technologies are named "Quokka" but arise in different research contexts. This article systematically reviews the major Quokka frameworks in use, their technical details, and their domain-specific implications.
1. Quokka in Materials Science NLP: Domain-Specialized LLMs
The Quokka family within materials informatics consists of open-source LLMs—Quokka-7B and Quokka-13B—produced by continued pretraining of LLaMA-2 architectures using more than 1.1 million materials science papers from S2ORC, followed by instruction-tuning on thousands of expert-curated prompts (Yang et al., 2024).
Key technical features:
- Base models: LLaMA-2 7B (32 layers, hidden size 4096, 32 heads) and 13B (40 layers, hidden size 5120, 40 heads); unmodified Transformer backbone.
- Tokenization: SentencePiece (Unigram), vocabulary size 32,000, context window 1,024 tokens.
- Continued pretraining: 8×A100 GPUs, DeepSpeed Zero-3, FlashAttention, bf16 precision, single epoch, AdamW optimizer, cosine LR schedule, substantial data mixing (10% RedPajama general corpus to prevent catastrophic forgetting).
- Objective: Standard autoregressive cross-entropy; final perplexity on S2ORC—1.78 for Quokka-7B and 1.52 for Quokka-13B.
- Instruction tuning: >3,300 data points, blended from LIMA, HoneyBee, and custom prompts; bf16, FSDP, 15 epochs; rapid loss convergence reported.
The released checkpoints (7B/13B; with/without instruction tuning) are public, and inference can be performed using HuggingFace Transformers with resource requirements of ≥16GB (7B) or ≥24GB (13B) GPU RAM.
Main applications include materials property lookup, synthesis pipeline queries, education, patent/literature summarization, and downstream fine-tuning for NER, relation extraction, and action-graph parsing. Limitations include the absence of large-scale MatSci-NLP benchmarks, unimodal text capability only, finite instruction diversity, and typical LLM hallucination risk (Yang et al., 2024).
| Model | Size (B params) | Pretrained on S2ORC | Instruction-tuned | Use case |
|---|---|---|---|---|
| Quokka-7B | 7 | Yes | No | Foundation/fine-tune |
| Quokka-13B | 13 | Yes | No | Foundation/fine-tune |
| Quokka-7B-Chat | 7 | Yes | Yes | Interactive chatbot |
| Quokka-13B-Chat | 13 | Yes | Yes | Interactive chatbot |
2. Quokka Scaling Laws for Diffusion LLMs
Quokka also denotes the first closed-form scaling-law framework for diffusion LLMs (DLMs) (Ni et al., 28 Sep 2025). It extends the scope of Chinchilla's scaling laws by providing actionable formulas for both compute-constrained and data-constrained regimes and by systematically exploring DLM-specific optimization and design interventions.
Core theory:
- Compute-constrained regime: Pretraining loss
where is parameter count, is token count, is irreducible error, and are, respectively, model and data error terms. Optimal allocation under FLOPs yields
with empirical exponents and .
- Data-constrained regime: Captures the U-shaped loss vs. epoch curve and overfitting, with closed forms for loss and the epoch 0 where 1 is the unique token budget.
- Key distinctions: DLMs are significantly more data-hungry (by 2–3) than AR LLMs at similar FLOPs.
Empirically, Quokka validates these scaling relations with a grid of 23,000+ iso-FLOPs runs and dense ablation of kernel, schedule, loss, and hyperparameter choices. Masked (absorbing) diffusion, linear schedules, and diffusion ELBO loss are preferred for optimal DLM performance.
3. Quokka as a GPU-Accelerated Hydrodynamics and Astrophysical RHD Code
Quokka designates a block-structured, adaptive mesh refinement (AMR) code for (magneto)hydrodynamics and radiation hydrodynamics (RHD) targeting massively parallel GPU architectures (Wibking et al., 2021, He et al., 2024, He et al., 22 Sep 2025).
Principal architecture:
- AMReX-based, C++/CUDA, Kokkos for cross-platform support
- Hydro solver: Piecewise parabolic method (PPM), method-of-lines second-order RK integrator, HLLC Riemann solver, operator splitting for gravity, cooling, and feedback.
- Radiation: Mixed-frame two-moment VET (M1 closure), subcycling in time, reduced speed-of-light approximation, multigroup extension using piecewise power-law (PPL) frequency binning, O(4) coupling algorithm for efficient source term integration, and exact conservation properties (He et al., 2024).
- Scalability and throughput: Strong/weak scaling demonstrated to O(102–103) GPUs; per-GPU rates of >250M zone-updates/sec (hydro), ~40M (RHD with substeps).
- Key physics modules: Optically thin cooling, primitive and metal line via Grackle; GPU-native structure for passive scalars and user feedback kernels.
The code is open-source, with detailed regression tests and modular architecture enabling integration of gravity (AMReX multigrid), sink/star particles (via novel PMP kernel), future MHD, and non-equilibrium chemistry (Wibking et al., 2021, He et al., 22 Sep 2025).
4. Quokka in Astrophysical Outflows, Metal Loading, and the QED Simulation Suite
Within galaxy formation and ISM/CGM physics, Quokka underpins the high-fidelity "QED" suite of tall-box simulations quantifying metal loading, phase structure, selective transport, and X-ray diagnostics of galactic winds (Vijayan et al., 2023, Vijayan et al., 12 Jan 2026, Huang et al., 2024).
Technical features:
- Tall-box geometry: Uniform-resolution domains (500 pc–1 kpc in-plane, ±4–8 kpc vertical), typically at Δx ≈ 2 pc, with static gravity and no self-gravity to isolate feedback-driven outflows.
- Physics: Supernova/SFR-driven feedback, discrete element injection for different nucleosynthetic sources (Type II SNe, Type Ia SNe, AGB stars), GPU-native solver for passive scalars.
- Diagnostics: Metal-loading factors η_Z, corrected metal-loading φ, and phase-resolved flux partitioning across cold, warm, and hot components, with convergence criteria showing that metal fluxes require at least 4 pc resolution for robust statistics (Vijayan et al., 2023).
Critical results highlight prompt ejection of SN-synthesized metals (5–0.9), demonstrate substantial "differential metal loading" (variations 6 dex between elements of different origin—Type Ia SNe, Type II SNe, AGB; (Vijayan et al., 12 Jan 2026)), and clarify the impact of outflow phase structure on observed mass–metallicity relations.
QED II (Huang et al., 2024) couples Quokka simulation data with forward-modeled soft X-ray spectra (pyXSIM, SOXS, SHERPA) to show that disc-to-wind metallicity gradients in X-rays are a robust tracer of mixing between SN ejecta and cold ISM, leading to an analytical diagnostic:
7
where 8 is hot wind metallicity at height and 9 that of fresh SN ejecta near the disc, providing a quantitative tool for comparison with observations.
5. Quokka in Distributed Query Processing: Efficient Lineage-Based Fault Tolerance
Quokka has also been deployed as a distributed, pipelined, fault-tolerant SQL engine (Wang et al., 2024). Its principal innovation is "write-ahead lineage," which enables efficient intra-query fault tolerance and recovery:
- Execution model: Push-based, highly pipelined DAG of query stages, Arrow Flight for zero-copy shuffle, distributed head node with Redis-based global control store.
- Write-ahead lineage: Logs KB-sized metadata per task at runtime, enforcing that only partitions with committed lineage are consumed; this eliminates the need for spooling or frequent checkpointing of large intermediate states.
- Fault recovery: Upon worker failure, pipelined parallel recovery (Algorithm 2 in (Wang et al., 2024)) replays minimal lost work based on dynamic, stage-aware lineage, preventing global rollback and providing rapid, fine-grained restoration.
- Performance: On TPC-H SF-100, Quokka achieves 1.9× speedup over SparkSQL and 1.6× over Trino+FT. Write-ahead lineage imposes only 6–15% runtime overhead compared to 1.5× for HDFS/S3 partition spooling; recovery time scales with the number of lost partitions and pipeline depth but not total query DAG size.
Insights include the O(1) per-task lineage encoding, dynamic runtime lineage adaptation, and elimination of excessive state checkpointing or network/disk traffic inherent to spooling approaches.
6. Key Methodological and Theoretical Innovations
Across all domains, “Quokka” advances computational science with:
- Best-practice scaling law derivations for machine learning, including DLM-specific regimes and actionable formulas for model/data allocation (Ni et al., 28 Sep 2025).
- GPU-native, strong- and weak-scalable implementations of compressible hydro/RHD and passive scalar transport, with rigorous treatment of operator splitting, AMR, and communication (Wibking et al., 2021, He et al., 2024, He et al., 22 Sep 2025).
- Atomic, order-independent particle-mesh-particle (PMP) kernels for feedback and accretion with exact conservation and minimal MPI footprint (He et al., 22 Sep 2025).
- Closed-form diagnostics for ISM/CGM metal transport, validated via multi-phase, high-resolution tall box simulations (Vijayan et al., 2023, Huang et al., 2024, Vijayan et al., 12 Jan 2026).
- Efficient, lineage-based logging for distributed data workflows, yielding both low normal-time and recovery overhead in SQL analytics (Wang et al., 2024).
7. Limitations and Future Directions
The various Quokka platforms have domain-specific constraints:
- Quokka LLMs: Largely unimodal, limited large-scale benchmark data, further generalization requires more diverse fine-tuning and possibly multimodal extensions (Yang et al., 2024).
- Quokka astrophysics codes: No magnetic fields or cosmic rays in default runs (planned extension); tall-box geometry does not follow wind parcels to >10 kpc or fully model CGM or fountains (Vijayan et al., 12 Jan 2026, Vijayan et al., 2023). Current RHD uses M1 closure and neglects frequency-dependent scattering, but extensions to multigroup, nonlocal VET, and scattering are in development (He et al., 2024).
- Particle-mesh-GPU interactions: Kernel radius must not exceed ghost layer size (memory constraint), and full bitwise-reproducible summations are complex (He et al., 22 Sep 2025).
- Distributed query processing: Recovery parallelism is limited to one worker per pipeline stage, less scalable at extremely large cluster sizes than some data-parallel systems; single-point GCS (Redis); head node remains a potential bottleneck (Wang et al., 2024).
A plausible implication is that future Quokka versions will see rapid horizontal integration—a multimodal LLM for scientific data, deeper coupling of RHD/chemistry in GPU frameworks, and broader benchmarking in production-grade analytical pipelines across exascale architectures. Ongoing work emphasizes multigroup RHD with exact conservation, hybrid moment-Monte Carlo transports, and on-the-fly particle feedback in galaxy simulations. The general trend is convergence of highly specialized, domain-adaptable architectures with algorithms that are provably optimal in scaling, memory, and compute (He et al., 2024, Ni et al., 28 Sep 2025).