Papers
Topics
Authors
Recent
Search
2000 character limit reached

Marvin: A Toolkit for Streamlined Access and Visualization of the SDSS-IV MaNGA Data Set

Published 6 Dec 2018 in astro-ph.IM and astro-ph.GA | (1812.03833v1)

Abstract: The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, one of three core programs of the fourth-generation Sloan Digital Sky Survey (SDSS-IV), is producing a massive, high-dimensional integral field spectroscopic data set. However, leveraging the MaNGA data set to address key questions about galaxy formation presents serious data-related challenges due to the combination of its spatially inter-connected measurements and sheer volume. For each galaxy, the MaNGA pipelines produce relatively large data files to preserve the spatial correlations of the spectra and measurements, but this comes at the expense of storing the data set in a coarsely-chunked manner. The coarse chunking and total volume of the data make it time-consuming to download and curate locally-stored data. Thus, accessing, querying, visually exploring, and performing statistical analyses across the whole data set at a fine-grained scale is extremely challenging using just FITS files. To overcome these challenges, we have developed \marvin: a toolkit consisting of a Python package, Application Programming Interface (API), and web application utilizing a remote database. \marvin's robust and sustainable design minimizes maintenance, while facilitating user-contributed extensions such as high level analysis code. Finally, we are in the process of abstracting out \marvin's core functionality into a separate product so that it can serve as a foundation for others to develop \marvin-like systems for new science applications.

Citations (120)

Summary

Overview of Marvin: A Toolkit for Streamlined Access and Visualization of the SDSS-IV MaNGA Data Set

The paper titled "Marvin: A Toolkit for Streamlined Access and Visualization of the SDSS-IV MaNGA Data Set" introduces Marvin, a comprehensive toolkit designed to facilitate the interaction with the MaNGA data set, part of the Sloan Digital Sky Survey IV (SDSS-IV). This toolkit addresses significant data-related challenges posed by the large, high-dimensional integral field spectroscopic data accumulated through the MaNGA survey. The authors present Marvin as a solution to efficiently access, query, visualize, and analyze this vast compilation of astronomical data, which is critical for advancing the understanding of galaxy formation and evolution.

Core Components

Marvin encompasses three primary components—a Python package, an Application Programming Interface (API), and a web application—each aimed at various levels of data interaction. These components leverage a remote database, enabling users to overcome traditional issues associated with data volume and coarse data chunking that have restricted local download and curation capabilities. By integrating these components, Marvin ensures a sustainable, low-maintenance system capable of supporting the methodologies of different users from professional astronomers to data scientists and educators.

A noteworthy aspect of Marvin is its Multi-Modal Access (MMA) system, which intelligently determines the optimal mode for accessing data (local files, local databases, or remote API). This design dramatically simplifies user interaction with the data, abstracting the complexities of the underlying data management from the end-user.

Challenges Addressed

The development of Marvin addresses the challenges associated with the MaNGA data set's spatially interconnected measurements and sheer volume. Traditionally, accessing spatially resolved data from such a large data set requires extensive manual preparation and management of data files. Marvin overcomes these challenges by centralizing access to finer-grained data slices through an interactive API, allowing streamlined querying across the entire data set.

Implications and Future Prospects

The implications of Marvin's development are significant. Practically, researchers benefit from reduced overhead in data handling, access to shared analytical tools, and the ability to engage deeply with sophisticated spatially-resolved data. This not only accelerates scientific progress but also democratizes data access and analysis capabilities among the global astronomical community.

Theoretically, Marvin stands as a template for future developments in data architecture, highlighting a scalable, flexible framework that can be adapted to other large-scale astronomical data sets, potentially extending to other scientific disciplines that encounter similar data challenges.

Looking forward, the potential future developments could include expanding Marvin's capabilities to support more complex analyses and additional data sets from future astronomical surveys. Additionally, abstracting Marvin's core functionalities into a separate, reusable product could serve as a foundational tool for similar challenges in data-intensive domains, encouraging collaborative developments beyond the immediate scope of astronomy.

In conclusion, Marvin represents a sophisticated, well-structured approach to tackling one of the enduring challenges of modern astronomy: efficiently managing, accessing, and utilizing vast and complex data sets. As such, it is poised to make significant contributions to the field of astronomical data analysis and serves as a promising model for future innovations.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.