Papers
Topics
Authors
Recent
2000 character limit reached

Automated KOA Archive Management

Updated 29 November 2025
  • Automated KOA Management is an end-to-end, event-driven system that orchestrates real-time data ingestion, metadata processing, and archival workflows for the Keck Observatory Archive.
  • It employs a modular, instrument-agnostic Python architecture with YAML-configured multi-threaded pipelines and RESTful APIs to ensure rapid, reliable data processing.
  • Performance metrics show raw data ingestion latencies around 30 seconds and quick-look products available within 5 minutes, enabling timely scientific analysis.

Automated KOA Management refers to the end-to-end, largely unattended operation and orchestration of the Keck Observatory Archive (KOA), encompassing real-time data ingestion, metadata retrofitting, instrument-agnostic configuration, fault-tolerant workflows, high-throughput reduction pipelines, and observer-facing access portals. Since 2022, KOA has deployed a comprehensive Python-based system replacing the legacy C monolith, achieving latency goals on the order of one minute for raw data and five minutes for quick-look products across all active instruments (Berriman et al., 2022). The management layer is driven by event-based monitoring, configuration templates, multi-threaded reliability strategies, and standardized database operations, providing observer access and science enablement as soon as exposures are acquired.

1. System Architecture and Pipeline Overview

The core architecture operates across two physical sites: WMKO (the telescope "mountain") and NExScI (the archival "valley"). At WMKO, a Python-based Monitoring Daemon intercepts file events using Keck Task Library (KTL) callbacks, eliminating the need for polling or scheduled scripts. For each new file, a Data Evaluation & Processing (DEP) thread extracts required metadata, stages files into a shared NFS-mounted filesystem, and triggers ingestion by issuing an HTTP POST to a Flask micro-service at NExScI. The receiving component at NExScI consists of a compact Python-based Ingestion Service (~4–5 KLOC), which validates, commits, and migrates data into the long-term archive (Berriman et al., 2022).

The ingestion flow is illustrated as follows:

1
KTL keyword change → Monitor Daemon → DEP (thread) prepares & stages file → HTTP POST /ingest (Flask) → NExScI Ingestion Service → Database & Archive FS → Status report

Key infrastructure choices include fully NFS-shared storage, eliminating explicit file transfers (e.g., rsync/scp), and simple RESTful API calls for pipeline orchestration. All instrument-specific details (e.g., filename patterns, metadata key maps) are loaded at runtime from configuration templates (YAML/INI), creating a dispatch table for rule-based ingestion without code changes. The modular design keeps each critical component below ~1 KLOC, simplifying isolation and recovery.

2. Event-Driven Monitoring and Trigger Mechanisms

KOA’s event-driven workflow relies on KTL keyword monitoring rather than periodic cron jobs, ensuring function-call granularity in operational response. For example, on detection of a WRITE_FILE keyword change, the Monitoring Daemon triggers the DEP pipeline, which exclusively processes the new file. No filesystem polling or batch scheduling is employed. Python threading tracks each file-processing job, and all failures are managed via a queue and notification system, which includes retry logic and error reporting (typically via automated email to the engineering staff).

Pseudocode fragment for file monitoring:

1
2
3
4
5
def on_new_file_callback(kname, value, **kwargs):
    file_path = value
    t = Thread(target=process_and_trigger, args=(file_path,))
    t.start()
keyword("WRITE_FILE").watch(on_new_file_callback)
This operational motif achieves minimal detection latency (Tdetect≲1 sT_\mathrm{detect} \lesssim 1~\mathrm{s}), and is directly coupled to near-real-time metadata staging and database writing.

3. Instrument-Agnostic Configuration and Workflow

Automated KOA Management is parameterized by per-instrument YAML or INI configuration files. Each config file describes file-matching patterns, required metadata fields, directory structure, and transfer/timeout policies. At runtime, the system loads all instrument configs into memory, assembling a dispatch table mapping filenames to ingestion rules. This framework allows seamless onboarding of new instruments—by simply adding a configuration template, no new code is required.

Example kcwi.yaml:

1
2
3
4
5
6
7
8
9
instrument: KCWI
file_pattern: "KC*.fits"
metadata_map:
  OBJECT  : OBJECT
  DATE-OBS: DATE_OBS
  INSTRUME: INSTNAME
transfer:
  timeout: 300
  max_retries: 3
This design achieves fully instrument-agnostic ingestion and enables continuous deployment practices consistent with cloud-native operational models (Berriman et al., 2022).

4. Fault Tolerance, Reliability, and Recovery Strategies

The pipeline employs multi-threading, queue-based tracking, idempotent database operations, and manifest verification for robust, error-resistant operation. Each DEP thread is independently managed and failures (network, metadata, or validation) result in file-staging and notification for manual review or automated retry. The NExScI Ingestion Service enforces manifest consistency and batch idempotence; files are not committed to the archive unless all validation checks succeed. Exception handling mechanisms log all failures and send notifications on critical errors.

Both client and server implement shared retry policies (parameterized by max_retries), and the architecture's modularity (single-responsibility components <1 KLOC) facilitates localized recovery without global pipeline stalls. The manifest-driven approach ensures that only validated, complete data sets are committed, and any missing files can be recognized and retried at both file and batch level (Berriman et al., 2022).

5. Performance Metrics and Latency Bounds

Operational timing is a principal benchmark for the automated system. Across eight instruments, raw data ingestion median latency is approximately 30 seconds with a 95th percentile under 60 seconds. Quick-look (Level 1) data ingestion targets under 10 minutes and is typically achieved within 5 minutes. The pipeline latency is bounded as:

Ttotal≤Tdetect+Tprep+Ttransfer+TdbT_\mathrm{total} \leq T_\mathrm{detect} + T_\mathrm{prep} + T_\mathrm{transfer} + T_\mathrm{db}

Where:

  • TdetectT_\mathrm{detect}: KTL callback latency (≲1 s)(\lesssim 1~\mathrm{s})
  • TprepT_\mathrm{prep}: Metadata extraction, staging (≲10 s)(\lesssim 10~\mathrm{s})
  • TtransferT_\mathrm{transfer}: NFS file copy (≲10 s)(\lesssim 10~\mathrm{s})
  • TdbT_\mathrm{db}: Validation and archive commit (≲30 s)(\lesssim 30~\mathrm{s})

No explicit queueing theory is employed, but the event-triggered, modular layout demonstrates median total ingestion time of 30 seconds and compliance with strict high-throughput goals (Berriman et al., 2022).

6. Context: Modernization and End-to-End Data Services Initiative

Real-time ingestion is central to KOA’s Data Services Initiative (DSI), intended to unify the operational cycle from observation planning through automated science data reduction and archiving. Previous legacy systems relied on nightly batch operations implemented in C (≈50 KLOC), with per-instrument forks and significant manual supervision. The modern pipeline delivers continuous ingestion, consolidated to ≈5 KLOC of Python code, supporting all instruments under a single codebase and integrating open-source data reduction pipelines via GitHub (Berriman et al., 2022).

Modernization has accelerated observer access (≤60 seconds post-acquisition), enabled rapid time-domain and multi-messenger astronomy (facilitating e.g., Rubin/LIGO follow-ups), and standardized the operational protocol for instrument onboarding and archiving. The architecture explicitly supports both raw (Level 0) and reduced (Level 1/2) science products, closing the loop between telescope and archive.

7. Significance and Future Directions

The Automated KOA Management system establishes a reference architecture for event-driven, highly modular data management in astronomical archives. Its instrument-agnostic, configuration-driven model supports seamless extensibility, rapid fault recovery, and near-real-time science enablement. As Python-based reduction pipelines mature, science-ready data products will be regularly ingested, advancing capabilities for both immediate analysis and longitudinal archival research. The system’s modernization represents a paradigm shift from static, batch-oriented data management to dynamic, observer-driven scientific workflows (Berriman et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Automated KOA Management.