Time series and machine learning to forecast the water quality from satellite data (2003.11923v1)

Published 16 Mar 2020 in physics.ao-ph, cs.LG, and stat.ML

Abstract: Managing the quality of water for present and future generations of coastal regions should be a central concern of both citizens and public officials. Remote sensing can contribute to the management and monitoring of coastal water and pollutants. Algal blooms are a coastal pollutant that is a cause of concern. Many satellite data, such as MODIS, have been used to generate water-quality products to detect the blooms such as chlorophyll a (Chl-a), a photosynthesis index called fluorescence line height (FLH), and sea surface temperature (SST). It is important to characterize the spatial and temporal variations of these water quality products by using the mathematical models of these products. However, for monitoring, pollution control boards will need nowcasts and forecasts of any pollution. Therefore, we aim to predict the future values of the MODIS Chl-a, FLH, and SST of the water. This will not be limited to one type of water but, rather, will cover different types of water varying in depth and turbidity. This is very significant because the temporal trend of Chl-a, FLH, and SST is dependent on the geospatial and water properties. For this purpose, we will decompose the time series of each pixel into several components: trend, intra-annual variations, seasonal cycle, and stochastic stationary. We explore three such time series machine learning models that can characterize the non-stationary time series data and predict future values, including the Seasonal ARIMA (Auto Regressive Integrated Moving Average) (SARIMA), regression, and neural network. The results indicate that all these methods are effective at modelling Chl-a, FLH, and SST time series and predicting the values reasonably well. However, regression and neural network are found to be the best at predicting Chl-a in all types of water (turbid and shallow). Meanwhile, the SARIMA model provides the best prediction of FLH and SST.

Citations (4)

View on Semantic Scholar

Summary

The paper presents a comprehensive approach combining time series (SARIMA) and machine learning models to accurately forecast key water quality parameters.
Regression and neural network models achieved an R² of 0.8 predicting chlorophyll-a, outperforming SARIMA in challenging shallow and turbid conditions.
The study provides actionable insights for environmental monitoring and coastal management, supporting real-time water quality assessments.

Time Series and Machine Learning Approaches for Water Quality Forecasting Using Satellite Data

The paper "Time series and machine learning to forecast the water quality from satellite data" by Maryam R. Al Shehhi and Abdullah Kaya presents a methodical examination of forecasting water quality from satellite-derived data, focusing on the Arabian Gulf as a case paper. The research leverages Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data to predict key water quality parameters: chlorophyll-a (Chl-a), fluorescence line height (FLH), and sea surface temperature (SST). The paper applies a combination of time series models and machine learning techniques, including Seasonal Autoregressive Integrated Moving Average (SARIMA), regression models, and neural networks.

Overview of Methodologies

The authors categorize their approaches into univariate and multivariate time series methods. Univariate methods, specifically SARIMA, cater to single-variable time series, while regression and neural network models address multivariate dependencies. The paper meticulously addresses challenges such as missing data, a common issue in satellite-based measurements, through the use of multiple imputations using predictive mean matching (PMM).

The SARIMA model, as part of the Box-Jenkins approach, is employed to capture the temporal dependencies and seasonality inherent in SST and FLH datasets. Key parameters such as autoregressive order, differencing order, and moving average order are calculated using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) frameworks.

For the regression models, variations are developed to understand the relationship between Chl-a and explanatory variables like time, FLH, and SST across different temporal lags. Neural networks extend this analysis by employing a nonlinear approach to capture complex patterns, crucial for non-stationary and intricate datasets.

Key Results and Findings

The paper underscores that different modeling strategies exhibit varying levels of efficacy depending on water depth and turbidity levels. Regression and neural networks demonstrate superior performance in predicting Chl-a values across diverse water types, whereas SARIMA excels in forecasting FLH and SST. Specifically, regression models achieve high performance for estimating Chl-a in deeper, less turbid waters, with a noted R² of 0.8 and RMSE of 0.3. On the contrary, neural networks take precedence in very shallow, turbid conditions, offering an R² of 0.8.

Overall, SARIMA's stability in its application across varying SST datasets presents it as an advantageous model for FLH and SST predictions, notwithstanding its limitations with the more complex Chl-a forecasts. The neural network and regression models accommodate the nuanced seasonality and spikes prevalent in Chl-a data, affirming their applicability in managing high temporal and spatial variability.

Implications and Future Directions

This paper contributes to the practical monitoring and legislative management of coastal waters, providing robust tools for environmental agencies to forecast harmful algal bloom (HAB) incidents. The predictive insights could transform decision-making processes surrounding aquatic health and economic activities such as fisheries and tourism.

Theoretical implications include the interdisciplinary blend of time series analysis and machine learning, highlighting pathways for future exploration in automated environmental monitoring systems. The research points to the necessity of extending these methodologies to diverse geographic and climatic conditions, thus enhancing the generalizability and scalability of such models.

Looking forward, advancements in AI frameworks and remote sensing technology will likely enhance prediction accuracy and operational efficiency, enabling real-time water quality monitoring at both micro and macro scales. Continued studies incorporating novel datasets and emerging machine learning algorithms are essential for optimizing forecasting models and their application in varied oceanographic and meteorological contexts.

PDF Markdown