Submission Queue (SQ) for Astronomy
- Submission Queue (SQ) is a system for scheduling astronomical observations, enabling efficient target organization and execution.
- It integrates distributed databases, middleware, and operator interfaces to reduce errors and enhance real-time feedback.
- Its modular design, priority algorithms, and robust data logging deliver improved on-sky efficiency and operational reliability.
A Submission Queue (SQ) is a management system for organizing, prioritizing, and executing observational targets in a queue-scheduled astronomical facility. Its purpose is to optimize operational efficiency, reduce human error, and streamline the flow of information and control between multiple users, schedulers, and technical systems. The architecture and algorithms behind a modern SQ, as exemplified by the integrated CHIRON TOOLS deployment, encompass target submission, script preparation, execution control, real-time error handling, and post-observation data access in a distributed, resilient environment spanning remote observatories and institutional control centers (Brewer et al., 2013).
1. System Architecture and Workflow
The SQ system is architected around two physically distinct sites: a “master” server at Yale and a telescope-site server at CTIO. The Yale master hosts the Observer Web App for Principal Investigator (PI) interaction, the central MySQL database (DB), the data reduction pipeline, and backup infrastructure. The CTIO site server maintains a read-only replica of most tables plus a local copy of the Nightly Observing Script (NOS) tables, and runs the Interactive Observing Script (IOS) application.
Telescope-side, the control is distributed between a Telescope Control System (TCS) and an Instrument Control System (ICS), both with command-line interfaces on distinct Linux/VxWorks hosts. The middleware layer on the CTIO server translates high-level IOS API calls to low-level TCS/ICS command-line invocations.
Major modules:
- Observer Web App: For PI authentication, target/package creation, calibration specification.
- Scheduler/NOS Builder: GUI-driven tool for the queue manager to arrange nightly observations by Right Ascension (RA), visibility, and priority.
- Database Replication Layer: Bi-directional SSH-tunneled MySQL replication of NOS result tables; uni-directional replication for other tables from Yale to CTIO.
- IOS Application: Minimalist GUI for operator-controlled execution of NOS line-items, skip-reason tagging, and auto-pushing metadata to TCS/ICS.
- Logging/Data Access: Automatic line-item logging into
nos_results, propagated asynchronously back to Yale. The reduction pipeline is triggered hourly, with reduced products registered in the Yale DB.
Workflow steps:
- PI plans targets using the Web App, populating the Yale DB.
- Scheduler generates the NOS, flags scheduled rows, and replicates the script to CTIO.
- At CTIO, IOS facilitates stepwise execution/skip for each script line, with operator mediation as needed.
- Success/failure updates are logged locally and later synced to Yale.
- Raw data is rsynced to Yale for reduction and made available to the PI.
2. Submission Queue Management
The SQ leverages a normalized relational schema:
| Table | Key Fields/Description | Role |
|---|---|---|
objects |
(obj_id, RA, Dec, ...) | Stores celestial targets and config |
packages |
(pkg_id, obj_id, ...) | Groups objects/calibrations by PI |
scripts |
(script_id, date, ...) | Represents a NOS and versions |
script_objs |
(script_id, line_no, ...) | Ordered list for each NOS |
nos_results |
(script_id, line_no, status) | Real-time observed statuses |
State for each script line progresses from SUBMITTED → IN_QUEUE → SENDING → (SUCCESS/FAILURE) → DONE/SKIPPED, with failure cases resulting in either retry (to a maximum) or direct skip, all timestamped for PI visibility.
Targets can be assigned a “priority score”:
where:
- is a fairness term (fraction of allocated vs. used time),
- is a visibility term (window fraction remaining),
- is weather sensitivity,
- is time sensitivity (e.g., Target-of-Opportunity, ephemeris constraint),
- sum to 1 and are user-tunable.
Queue management accommodates real-time preemption (e.g., weather downgrades, technical failures), deferred rescheduling, and lossless status/error reporting via DB replication even under CTIO–Yale network outages.
3. Scheduling Algorithms
The operational scheduling paradigm is a human-assisted greedy algorithm, primarily sorting by RA and visibility constraints. Formally:
- Iterate through unobserved targets in RA order.
- Select if , with:
captures reconfiguration overheads (e.g., slit/binning/I cell).
Script assembly follows a pseudo-greedy maximization:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Input: Targets U, night window [T_start,T_end], current_time = T_start last_config = default Script = empty list While current_time < T_end and U not empty: for each i in U: if visibility_window(i) contains current_time: S(i) = P_i – λ·Δt_config(config(i), last_config) else: S(i) = –∞ select i* = argmax_i S(i) if current_time + t_obs(i*) ≤ T_end: append i* to Script at time current_time current_time += t_obs(i*) + Δt_config(config(i*), last_config) last_config = config(i*) remove i* from U else: break end |
Objective:
Subject to:
- ,
- ,
- .
The architecture anticipates later substitution of integer programming or simulated annealing for enhanced optimization.
4. Automated Execution and Real-Time Feedback
Upon NOS finalization, the system locks scheduled objects, increments script versions, and propagates scripts to the site server within seconds. The IOS GUI exposes “Send”/“Skip” action buttons, streamlining operator workflow.
IOS API calls are translated to command-line invocations for TCS and ICS via middleware shell scripts. Each call sets RA/Dec, configuration, and exposure parameters, and monitors for acknowledgment or error signals before updating status (in XML) returned to the IOS.
The system incorporates robust error detection and recovery with retries up to . Status transitions follow a state machine:
- IDLE → SENDING → WAITING_ACK
- On ICS & TCS OK: COMPLETE → log DONE
- On ERR: If retries , retry; else, FAILED → operator alert → log SKIPPED
Status changes, retry counts, and skip reasons ({WEATHER, ICS_FAILURE, OPERATOR, OTHER}) are logged and asynchronously synced, with CTIO DB as the authoritative state in case of network partition.
5. Empirical Performance and System Impact
Deployment of the SQ-based CHIRON TOOLS system yielded quantifiable improvements:
| Metric | Pre-deployment | Post-deployment | Change |
|---|---|---|---|
| Script-prep time (mean ± σ) | 120 ± 15 min | 15 ± 5 min | t ≈ 20, p ≪ 10⁻⁶ |
| FITS-header error rate | 40% (10% unusable) | <1% PI input, ≈0% sys. | Nearly eliminated operator error |
| Nightly lost time | ≈1 hr/night | +1.75 hr/night | Recovery of overhead |
| Operator training time | ≈3 days | ≈4 hours | −87% |
| Mean targets/night | +20% | ||
| On-sky efficiency | 65% | 80% | +23% |
| Data delivery lag | 36 h | 12 h | −67% |
This suggests that the system contributed to a 20–30% gain in on-sky efficiency, near-zero human error, and a threefold reduction in data-delivery latency.
6. Implementation Lessons and Design Recommendations
Key commissioning challenges included underestimating required software effort (≈6 FTE-months beyond hardware), late adaptation for PI data privacy (necessitating per-user data segregation), and the need for resilient architecture due to intermittent CTIO internet connectivity—forcing local DB replicas and asynchronous bi-directional sync.
Solutions employed:
- Early middleware/API separation to minimize code coupling between GUIs and telescope/instrument controllers.
- Abstracting instrument-specific parameters to discrete DB tables, facilitating future hardware adaptation.
- Adoption of open-source, cross-platform technologies (e.g., Apache, MySQL, PHP, SSH) for speed and portability.
Recommended best practices:
- Scope end-to-end software early and budget ≥30% project FTE.
- Architect the DB to disentangle the core queue, script execution, and instrument-specific components, enabling 80% reusability across instruments.
- Deploy stepwise operator GUIs that auto-fill technical fields, eliminating ≈90% of manual entry errors.
- Plan for intermittent networking via local DBs and SSH-based replication.
- Implement a modular “priority score” early to facilitate future automation.
- Keep scheduling algorithms modular, supporting future replacement with more advanced solvers once operational constraints are clarified.
These practices yield a robust, efficient, and adaptable SQ platform applicable to diverse queue-scheduled observatories (Brewer et al., 2013).