Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems (2311.10255v2)

Published 17 Nov 2023 in cs.LG and q-bio.PE

Abstract: Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in LLMs to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Amanzi-ATS. https://amanzi.github.io/. Accessed: 2022-06-31.
  2. Swat: Model use, calibration, and validation. Transactions of the ASABE 55, 4 (2012), 1491–1508.
  3. Remote sensing applications in agriculture at the usda national agricultural statistics service. Tech. rep., Research and Development Division, USDA, NASS, Fairfax, VA, 2010.
  4. Language models are realistic tabular data generators. arXiv preprint arXiv:2210.06280 (2022).
  5. Monitoring us agriculture: the us department of agriculture, national agricultural statistics service, cropland data layer program. Geocarto International 26, 5 (2011), 341–358.
  6. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the lorenz 96 model. Journal of Computational Science 44 (2020), 101171.
  7. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  8. Sparse group lasso: Consistency and climate applications. In Proceedings of the 2012 SIAM International Conference on Data Mining (2012), SIAM, pp. 47–58.
  9. Heterogeneous stream-reservoir graph networks with data assimilation. In 2021 IEEE International Conference on Data Mining (ICDM) (2021), IEEE, pp. 1024–1029.
  10. Physics-guided meta-learning method in baseflow prediction over large regions. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM) (2023), SIAM, pp. 217–225.
  11. Physics-guided graph meta learning for predicting water temperature and streamflow in stream networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022), pp. 2752–2761.
  12. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
  13. Improving streamflow prediction in the wrf-hydro model with lstm networks. Journal of Hydrology 605 (2022), 127297.
  14. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
  15. Data assimilation: the ensemble Kalman filter, vol. 2. Springer, 2009.
  16. A gnn-rnn approach for harnessing geospatial and temporal information: application to crop yield prediction. In Proceedings of the AAAI conference on artificial intelligence (2022), vol. 36, pp. 11873–11881.
  17. Near-real-time forecast of satellite-based soil moisture using long short-term memory with an adaptive data integration kernel. Journal of Hydrometeorology 21, 3 (2020), 399–413.
  18. Learning quality characteristics for plastic injection molding processes using a combination of simulated and measured data. Journal of Manufacturing Processes 60 (2020), 134–143.
  19. Crop yield estimation using multi-source satellite image series and deep learning. In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium (2020), IEEE, pp. 5163–5166.
  20. Changes in net ecosystem productivity and greenhouse gas exchange with fertilization of douglas fir: Mathematical modeling in ecosys. Journal of Geophysical Research: Biogeosciences 115, G4 (2010).
  21. Debates—the future of hydrological sciences: A (common) path forward? using models and data to learn: A systems theoretic perspective on the future of hydrological science. Water Resources Research (2014).
  22. Coping with the curse of freshwater variability. Science 346, 6208 (2014), 429–430.
  23. Transfer learning from simulation to experimental data: Nmr chemical shift predictions. The Journal of Physical Chemistry Letters 12, 14 (2021), 3662–3668.
  24. Predicting lake surface water phosphorus dynamics using process-guided machine learning. 109136.
  25. Physics guided neural networks for time-aware fairness: an application in crop yield prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (2023), vol. 37, pp. 14223–14231.
  26. A general lake model (glm 3.0) for linking with high-frequency sensor data from the global lake ecological observatory network (gleon). Geoscientific Model Development 12, 1 (2019), 473–523.
  27. Modeling reservoir release using pseudo-prospective learning and physical simulations to predict water temperature. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM) (2022), SIAM, pp. 91–99.
  28. Bringing automated, remote-sensed, machine learning methods to monitoring crop landscapes at scale. Agricultural Economics 50 (2019), 41–50.
  29. Physics-guided machine learning for scientific discovery: An application in simulating lake temperature profiles. ACM/IMS Transactions on Data Science 2, 3 (2021), 1–26.
  30. Physics-guided recurrent graph model for predicting flow and temperature in river networks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM) (2021), SIAM, pp. 612–620.
  31. A review of data assimilation of remote sensing and crop models. European Journal of Agronomy 92 (2018), 141–152.
  32. The dssat cropping system model. European journal of agronomy 18, 3-4 (2003), 235–265.
  33. Physics-guided neural networks (pgnn): An application in lake temperature modeling. arXiv:1710.11431 (2017).
  34. Physics guided machine learning methods for hydrology. arXiv preprint arXiv:2012.02854 (2020).
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  36. Water for food: The global virtual water trade network. W05520.
  37. Lall, U. Debates—the future of hydrological sciences: A (common) path forward? one water. one world. many climes. many souls. Water Resources Research (2014).
  38. Estimating the autotrophic and heterotrophic respiration in the us crop fields using knowledge guided machine learning. In AGU Fall Meeting 2021 (2021), AGU.
  39. Kgml-ag: a modeling framework of knowledge-guided machine learning to simulate agroecosystems: a case study of estimating n 2 o emission using data from mesocosm experiments. Geoscientific Model Development 15, 7 (2022), 2839–2858.
  40. On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798 (2023).
  41. Markstrom, S. L. P2S–coupled Simulation with the Precipitation-Runoff Modeling System (PRMS) and the Stream Temperature Network (SNTemp) Models. US Department of the Interior, US Geological Survey, 2012.
  42. Food security. Tech. rep., IPCC, 2020.
  43. Debates—the future of hydrological sciences: A (common) path forward? a call to action aimed at understanding velocities, celerities and residence time distributions of the headwater hydrograph. Water Resources Research 50, 6 (2014), 5342–5350.
  44. Crop yield forecasting on the canadian prairies using modis ndvi data. Agricultural and Forest Meteorology 151, 3 (2011), 385–393.
  45. Hydronets: Leveraging river structure for hydrologic modeling. arXiv preprint arXiv:2007.00595 (2020).
  46. Improving crop estimates by integrating multiple data sources. National Academies Press, 2018.
  47. Exploring the exceptional performance of a deep learning stream temperature model and the value of streamflow data. Environmental Research Letters 16, 2 (2021), 024025.
  48. Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. Journal of Advances in Modeling Earth Systems 13, 2 (2021), e2020MS002405.
  49. Water quality data for national-scale aquatic research: The water quality portal. Water Resources Research 53 (2017).
  50. Process-guided deep learning predictions of lake water temperature. Water Resources Research 55, 11 (2019), 9173–9190.
  51. Description of the national hydrologic model for use with the precipitation-runoff modeling system (prms). Tech. rep., US Geological Survey, 2018.
  52. Tackling climate change with machine learning. ACM Computing Surveys (CSUR) 55, 2 (2022), 1–96.
  53. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
  54. Applications of deep learning in hydrology. Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences (2021), 283–297.
  55. Swat ungauged: hydrological budget and crop yield predictions in the upper mississippi river basin. Transactions of the ASABE 53, 5 (2010), 1533–1546.
  56. Explore spatio-temporal learning of large sample hydrology using graph neural networks. Water Resources Research 57, 12 (2021), e2021WR030394.
  57. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  58. US Geological Survey. National water information system data available on the world wide web (usgs water data for the nation).
  59. Instream water temperature model. instream flow information paper 16. us fish wildl serv. Div. Biol. Serv., Tech. Rep. FWS OBS 84, 15 (1984), 11–42.
  60. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  61. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture 177 (2020), 105709.
  62. Attention is all you need. Advances in neural information processing systems 30 (2017).
  63. Anypredict: Foundation model for tabular prediction. arXiv preprint arXiv:2305.12081 (2023).
  64. Predicting water temperature dynamics of unmonitored lakes with meta-transfer learning. Water Resources Research 57, 7 (2021), e2021WR029579.
  65. Summary of hydrologic modeling for the delaware river basin using the water availability tool for environmental resources (water). Tech. rep., US Geological Survey, 2015.
  66. Continental-scale water and energy flux analysis and validation for north american land data assimilation system project phase 2 (nldas-2): 2. validation of model-simulated streamflow. Journal of Geophysical Research: Atmospheres (2012).
  67. A flexible and efficient knowledge-guided machine learning data assimilation (kgml-da) framework for agroecosystem prediction in the us midwest. Remote Sensing of Environment 299 (2023), 113880.
  68. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022).
  69. Large language models are effective table-to-text generators, evaluators, and feedback providers. arXiv preprint arXiv:2305.14987 (2023).
  70. Quantifying carbon budget, crop yields and their responses to environmental variability using the ecosys model for us midwestern agroecosystems. Agricultural and Forest Meteorology 307 (2021), 108521.
  71. Near-term forecasts of stream temperature using deep learning and data assimilation in support of management decisions. JAWRA Journal of the American Water Resources Association 59, 2 (2023), 317–337.
Citations (4)

Summary

We haven't generated a summary for this paper yet.