Open Data from LIGO, Virgo, and KAGRA through the First Part of the Fourth Observing Run (2508.18079v1)
Abstract: LIGO, Virgo, and KAGRA form a network of gravitational-wave observatories. Data and analysis results from this network are made publicly available through the Gravitational Wave Open Science Center. This paper describes open data from this network, including the addition of data from the first part of the fourth observing run (O4a) and selected periods from the preceding engineering run, collected from May 2023 to January 2024. The public data set includes calibrated strain time series for each instrument, data from additional channels used for noise subtraction and detector characterization, and analysis data products from version 4.0 of the Gravitational-Wave Transient Catalog.
Collections
Sign up for free to add this paper to one or more collections.
Summary
- The paper presents the first public O4a release of gravitational-wave data from LIGO, Virgo, and KAGRA, detailing calibration methods and auxiliary channel information.
- It employs near realtime calibration uncertainty estimates using continuous photon-calibrator injections to enhance noise mitigation and event validation.
- Multiple strain data formats, detailed data quality flags, and extensive auxiliary channels support robust signal analysis and reproducible research.
Open Data from LIGO, Virgo, and KAGRA through the First Part of the Fourth Observing Run
Introduction and Scope
This paper presents the public release of gravitational-wave (GW) data from the LIGO, Virgo, and KAGRA observatories, focusing on the first segment of the fourth observing run (O4a), spanning May 2023 to January 2024. The release is facilitated through the Gravitational Wave Open Science Center (GWOSC), providing calibrated strain time series, auxiliary instrumental channels, and analysis products from GWTC-4.0. The dataset is structured to maximize accessibility and scientific utility, supporting a broad range of GW analyses, including compact binary coalescence (CBC), burst, continuous wave (CW), and stochastic background searches.
Observing Runs and Data Products
The GW network operates in discrete observing runs, with O4a representing the latest public release. During O4a, only LIGO Hanford (LHO) and Livingston (LLO) provided data suitable for analysis, as Virgo was offline for commissioning and KAGRA/GEO sensitivity was suboptimal. The primary data product is the calibrated strain h(t), sampled at 16,384 Hz, with a typical annual data rate of ∼4 TB per instrument. The strain data is accompanied by hundreds of thousands of auxiliary channels monitoring environmental and instrumental states.
Instrument sensitivity is characterized by the binary neutron star (BNS) inspiral range, which fluctuates due to noise and operational interruptions. The BNS range during O4a frequently approached 160 Mpc, with periods of reduced sensitivity due to elevated noise or downtime.
Figure 1: BNS inspiral range over time for O4a, illustrating sensitivity fluctuations and operational intervals.
Calibration Methodology and Uncertainties
Calibration reconstructs the differential arm motion ΔL(t) using the interferometer response function R and the error signal derr, yielding the strain h(t)=ΔL(t)/L. For O4a, calibration uncertainties are quantified hourly, incorporating continuous photon-calibrator sinusoidal injections at discrete frequencies. The systematic error and 1σ uncertainty envelopes are provided for each hour, with direct measurements at calibration frequencies.
Figure 2: Frequency-dependent calibration error for LIGO Hanford and Livingston during a one-hour period in O4a, showing magnitude and phase uncertainties.
The O4a release marks the first instance where near-realtime calibration uncertainty estimates are provided, enabling their use as the final calibrated strain product. The valid frequency range for analysis is $10$–5000 Hz, constrained by calibration and anti-aliasing filter roll-off.
Data Quality, Noise Mitigation, and Hardware Injections
Strain data is affected by non-Gaussian, non-stationary noise artifacts, including glitches and spectral lines. Data-quality flags (CAT1, CAT2, CAT3) are used to exclude compromised segments, with CAT1 marking severe issues. For CBC analyses, the iDQ supervised-learning framework provides statistical flags based on auxiliary channel activity, enabling dynamic vetoing or re-ranking of candidate events.
Hardware injections simulate GW signals for detector characterization and safety studies. In O4a, only CW-type injections (simulating spinning neutron stars) were present during observing mode, with minimal impact on transient searches.
Spectral line catalogs are maintained to identify persistent narrowband features, critical for CW and stochastic searches. Glitch subtraction is performed using BayesWave and linear noise subtraction, with 16 O4a candidate events requiring targeted mitigation.
Data Structure, Formats, and Access
Calibrated strain data is released in 4096-second files, available in HDF5 and GWF formats, at both 16 kHz and 4 kHz sampling rates. The file structure encodes metadata, strain arrays, data-quality masks, and injection masks. The bitmask structure for data quality and injections is standardized across runs, with additional bits for STOCH and CW searches in O4a.
Alternate strain releases provide multiple versions of the strain channel, including raw, narrowband-subtracted, broadband-subtracted, and glitch-gated data. These are accessible via OSDF and NDS2 interfaces, with AR (Analysis Ready) tags indicating segments suitable for analysis.
Auxiliary Channels and Instrumental Monitoring
Approximately 200,000 auxiliary channels per instrument are recorded, with a curated subset released for noise subtraction and data-quality flagging. These channels include environmental sensors and instrumental diagnostics. Auxiliary data is available for all ANALYSIS_READY times, with documentation specifying channel names and sampling rates. Additional releases support machine learning studies and noise-subtraction research.
Event Portal and Analysis Products
The GWOSC Event Portal provides a database of published GW transients, including strain data, segment lists, detection confidence, source parameters, and posterior samples. The portal supports HTML browsing and REST API queries, with a Python client for automated access.
Figure 3: Event Query Form interface for custom selection of GW events based on multiple attributes.
Parameter-estimation results are provided as credible intervals, with standardized naming conventions. Supplemental data releases include posterior samples, localizations, and versioned snapshots on Zenodo. Community catalogs from external authors are ingested using a standardized JSON schema, expanding the scope of available events.
Implications and Future Directions
The public release of O4a data, with comprehensive calibration, data quality, and auxiliary information, enables reproducible GW science and facilitates cross-disciplinary research. The dataset supports advanced noise modeling, machine learning applications, and multi-messenger astrophysics. The inclusion of real-time calibration uncertainties and multiple strain channel versions enhances the reliability of parameter estimation and event validation.
The release strategy, with planned future datasets from O4b and O4c, will further increase the volume and diversity of GW events available for analysis. The infrastructure and data standards established here set a precedent for open data practices in GW astronomy, supporting both methodological innovation and broad community engagement.
Conclusion
This paper provides a detailed account of the open data products from LIGO, Virgo, and KAGRA for O4a, including calibration procedures, data quality management, file structures, and event cataloging. The release maximizes scientific utility and transparency, supporting robust GW analyses and fostering future developments in GW data science. The forthcoming releases from subsequent observing segments will continue to expand the scope and impact of open GW data.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Follow-up Questions
- How does the near realtime calibration uncertainty improve the accuracy of gravitational-wave signal analysis?
- What specific methods are used to mitigate non-Gaussian noise and glitches in the O4a dataset?
- In what ways can researchers leverage the auxiliary channels to refine noise subtraction algorithms?
- How do the various strain data formats facilitate diverse gravitational-wave search strategies?
- Find recent papers about gravitational-wave open data.
Related Papers
- GWTC-1: A Gravitational-Wave Transient Catalog of Compact Binary Mergers Observed by LIGO and Virgo during the First and Second Observing Runs (2018)
- GWTC-3: Compact Binary Coalescences Observed by LIGO and Virgo During the Second Part of the Third Observing Run (2021)
- Prospects for Observing and Localizing Gravitational-Wave Transients with Advanced LIGO, Advanced Virgo and KAGRA (2013)
- 4-OGC: Catalog of gravitational waves from compact-binary mergers (2021)
- Open data from the first and second observing runs of Advanced LIGO and Advanced Virgo (2019)
- Open data from the third observing run of LIGO, Virgo, KAGRA and GEO (2023)
- A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals (2019)
- The LIGO Open Science Center (2014)
- All-sky search for long-duration gravitational-wave transients in the first part of the fourth LIGO-Virgo-KAGRA Observing run (2025)
- GWTC-4.0: Updating the Gravitational-Wave Transient Catalog with Observations from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run (2025)
Tweets
alphaXiv
- Open Data from LIGO, Virgo, and KAGRA through the First Part of the Fourth Observing Run (4 likes, 0 questions)